The Matlab programming environment is often perceived as a platform suitable for prototyping and modeling but not for “serious” applications. One of the main complaints is that Matlab is just too slow.
Accelerating MATLAB Performance (CRC Press, ISBN 9781482211290, 785 pages) aims to correct this perception, by describing multiple ways to greatly improve Matlab program speed.
The book:
- Demonstrates how to profile MATLAB code for performance and resource usage, enabling users to focus on the program’s actual hotspots
- Considers tradeoffs in performance tuning, horizontal vs. vertical scalability, latency vs. throughput, and perceived vs. actual performance
- Explains generic speedup techniques used throughout the software industry and their adaptation for Matlab, plus methods specific to Matlab
- Analyzes the effects of various data types and processing functions
- Covers vectorization, parallelization (implicit and explicit), distributed computing, optimization, memory management, chunking, and caching
- Explains Matlab’s memory model and shows how to profile memory usage and optimize code to reduce memory allocations and data fetches
- Describes the use of GPU, MEX, FPGA, and other forms of compiled code
- Details acceleration techniques for GUI, graphics, I/O, Simulink, object-oriented Matlab, Matlab startup, and deployed applications
- Discusses a wide variety of MathWorks and third-party functions, utilities, libraries, and toolboxes that can help to improve performance
Ideal for novices and professionals alike, the book leaves no stone unturned. It covers all aspects of Matlab, taking a comprehensive approach to boosting Matlab performance. It is packed with thousands of helpful tips, code examples, and online references. Supported by this active website, the book will help readers rapidly attain significant reductions in development costs and program run times.
Use promo code MZK07 for a 25% discount and free worldwide shipping on crcpress.com
Reviews
… a very interesting new book on MATLAB® performance … covering basic tools and an appropriate range of specific programming techniques. The book seems to take a whole-system approach … helping readers understand the big picture of how to get better performance.
—Michelle Hirsch, Ph.D., Head of MATLAB Product Management, The MathWorks Inc.
Table of Contents
PREFACE
CHAPTER 1: Introduction to Performance Tuning CHAPTER 2: Profiling MATLAB Performance CHAPTER 3: Standard Performance-Tuning Techniques CHAPTER 4: MATLAB-Specific Techniques CHAPTER 5: Implicit Parallelization (Vectorization and Indexing) |
CHAPTER 6: Explicit Parallelization Using MathWorks Toolboxes 6.1 The Parallel Computing Toolbox – CPUs 6.2 The Parallel Computing Toolbox – GPUs 6.3 The MATLAB Distributed Computing Server 6.4 Techniques for effective parallelization in MATLAB CHAPTER 7: Explicit Parallelization by Other Means CHAPTER 8: Using Compiled Code CHAPTER 9: Memory-Related Techniques CHAPTER 10: Graphics and GUI CHAPTER 11: I/O Techniques APPENDIX A: Additional Resources APPENDIX B: Performance Tuning Checklist |
Book organization
This book is organized in chapters grouped by related functionality/usage. It is not necessary to read the book in order: the chapters and sections are mostly independent and stand alone. You can safely skip parts that you find difficult or uninteresting.
We begin with a theoretical description of performance tuning in Chapter 1. The discussion includes typical pitfalls, tradeoffs and considerations that need to be kept in mind before and during any tuning process.
Chapter 2 provides an overview of tools that are available in MATLAB in order to diagnose an application to determine the locations of, and reasons for, its performance hotspots. There are several different manners by which we can profile application run-time in MATLAB, and different situations may dictate different tools.
Chapters 3-11 discuss specific speedup techniques that can be used in MATLAB:
- Chapter 3 explains standard techniques adapted from non-MATLAB programming languages.
- Chapter 4 discusses techniques that are unique to MATLAB code.
- Chapter 5 discusses implicit parallelization, with indexing and vectorization.
- Chapter 6 and 7 discuss explicit parallelization using a variety of means (CPU, GPU and multi-threading).
- Chapter 8 discusses techniques for using compiled (binary) code.
- Chapter 9 discusses specific techniques that are memory-related. The non-trivial relationship between memory and performance is explained, and a variety of tuning techniques is presented in light of these explanations.
- Chapter 10 discusses techniques related to graphics, GUI and user interaction.
- Chapter 11 concludes the list of specific tuning techniques with a discussion of techniques related to I/O, particularly reading and writing files.
Chapters 3 through 11 are intended for use as a random-access reference. The sections and techniques can typically be used independently of each other. You can directly use any section or technique, without reading or using any other.
Appendix A presents online and offline resources that expand the information presented in the text and enable further research. Appendix B concludes the text by providing a non-comprehensive general checklist for performance-tuning.
Throughout the text, references are provided to enable interested readers to expand their knowledge of specific issues. Footnotes are used to clarify some points and to provide cross-references to other sections within this book; endnotes are used to provide references to related online resources. Most online references are provided in both full and shortened format, to enable easy usage when transcribed from hardcopy.
About the author
Yair Altman, author of the popular UndocumentedMatlab.com website, is well respected in the Matlab community as an expert on advanced MATLAB programming.
Yair’s first book, Undocumented Secrets of MATLAB-Java Programming, was published in 2011 to rave reviews and became the standard textbook on the subject. His many years of public contribution on MATLAB performance, plus a multitude of useful tips never before published, are now available in this highly readable volume.
Yair holds a BSc in physics and an MSc in computer science, both with high honors. Yair has over 20 years of professional software development experience at various levels of organizational responsibility, from programmer to VP R&D. He has developed systems using two dozen programming languages, on a dozen different platforms, half a dozen databases, and countless MATLAB releases.
Yair became an independent MATLAB consultant several years ago, and has never looked back. He currently assists clients world-wide in various MATLAB-related aspects: consulting, training, and programming. He can be reached at altmany@gmail.com.
Additional information can be found here.
Source codes
A zip file containing the source codes for all non-trivial functions in the book can be downloaded from here.
All the files are named s<section#>_function. For example: s6_1_7_matched_filter_spmd.m refers to the matched_filter_spmd m-function found in section 6.1.7, while s11_7_mexIO.c refers to the C-MEX function found in section 11.7. In order to run these files, you would typically need to remove the s<section#>_ prefix from their filename (i.e., creating matched_filter_spmd.m and mexIO.c).
Errata list
Last updated: 2016-03-13
Chapter | Section | Page | Paragraph | Current text | Correction type | Corrected text | Post date |
---|---|---|---|---|---|---|---|
1 | 1.3.2 | 8 | 2nd from bottom | …We might well find that having graceful degradation means that the system is sub-optimal (a bit slower) for the common scenario (no load)… | clarification | We might well find that having graceful degradation means that the system remains reasonably responsive at high loads at the expense of being sub-optimal (a bit slower) for the common scenario (no load) – this is definitely a tradeoff worth considering. On the other hand, we might wish to take an opposite approach, so that the common scenario achieves its higher performance goal, at the expense of uncommon load scenarios. | 2015-11-18 |
3 | 3.1.15 | 79 | 2nd from bottom | c1 = bsxfun(), b(1,:)'); |
fix | c1 = bsxfun(@minus, a(1,:), b(1,:)'); |
2015-01-01 |
3 | 3.2.1 | 82 | bottom | [uniqueVals, sortedStartIdx] = unique(data); |
fix | [uniqueVals, sortedStartIdx] = unique(sortedData); |
2016-02-03 |
3 | 3.3 | 94 | 4th from top | (none – new paragraph) | addition | In some cases, there is no directly-callable helper function, but we can still extract the core programming logic, excluding those checks and computations that we do not specifically need. For example, when the input data is properly scaled and formatted, we can use the core logic of the polyfit function to achieve significant speedups. | 2015-11-18 |
3 | 3.5.1.6 | 105 | 5th from bottom | (none – new paragraph) | addition | If you have the Database Toolbox, use the fastinsert function to insert multiple data records into a table in bulk mode. | 2015-11-18 |
3 | 3.6.3 | 113 | 4th from bottom | (none – new paragraph) | addition | We can reduce the size of the processed data set by ignoring elements whose effect on the overall result is negligible. For example, when running a processing kernel over the data, the kernel is often Gaussian in nature, quickly dropping to negligible values away from the kernel center. In such cases we can significantly reduce the processed data size by clipping the kernel, only using values close to the center (e.g., up to 13% of the kernel center’s value, representing ±2σ, or up to 1%, which represents ±3σ). | 2015-11-18 |
4 | 4.1.3 | 143 | 2nd from top | As a counter-example, the standard fft function is faster for doubles than singles… | fix | As another example, the standard fft function is faster for singles than for doubles (ignore the part on FFTW) | 2015-01-01 |
4 | 4.5.6 | 177 | top | FFT functions are generally faster for double-precision data than for singles… | fix | FFT functions are generally faster for single-precision data than for doubles (ignore the part on FFTW) | 2015-01-01 |
8 | 8.1.3 | 394 | 5th from top | printf("s() called with... |
fix | printf("%s() called with... |
2016-03-13 |
8 | 8.1.5 | 403 | footnote | See §8.1.6.3 | fix | See §8.1.7.3 | 2015-03-17 |
8 | 8.1.6 | 405 | 6th from top | mgGetProperty | fix | mxGetProperty | 2015-03-10 |
8 | 8.5.1 | 451 | 5th from top | (none – new paragraph) | addition | When loading shared libraries into MATLAB, it is faster (sometimes significantly) to use loadlibrary using the prototype and thunk files inputs rather than the standard header file. This avoids the need for MATLAB to recompile the header file(s) in run-time. | 2015-11-18 |
9 | 9.4.3 | 485 | bottom | (none – new paragraph) | addition | Note that in MATLAB R2015b (8.6), the allocation mechanism has significantly changed, and so has the relative performance of the various preallocation variants. On these new releases, it is important to compare the variants carefully on your specific platform. | 2015-11-18 |
9 | 9.5.3 | 504 | 5th from top | Accesss=private |
fix | Access=private |
2015-11-18 |
10 | 10.1.2 | 531 | 5th from bottom | (none – new text) | addition | Smaller markers are apparently faster than large ones.Marker performance has degraded in R2014b (HG2); it improved in R2016a, but is apparently still not back at the HG1 (R2014a) performance levels. | 2015-11-18 |
10 | 10.1.12.3 | 544 | 2nd from bottom | (none – new text) | addition | Using rgb2ind with up to 256 colors uses uint8 data while a larger number of colors uses uint16. In general, the larger the number of colors, the longer time it takes rgb2ind to process the data. For rgb2ind, it is also faster to use the non-default 'nodither' input option, but note that this may degrade the output quality. |
2015-11-18 |
10 | 10.4.3.8 | 590 | 2nd from top | if now - lastTime > 0.3*ONE_SEC |
fix | thisTime = now; |
2015-01-30 |
11 | 11.3.4 | 603 | 4th from top | (none – new paragraph) | addition | It often happens with binary data files that the data is stored in repeated patterns. It is natural to use an iterative loop in such cases, reading the pattern elements one by one within a loop over all patterns. However, for the reason outlined above this is extremely inefficient. It is both faster and simpler to textscan in order to read all patterns and their elements in one go. | 2015-11-18 |
11 | 11.4 | 611 | bottom | (none – new paragraph) | addition | When using the save function in MATLAB R2014b (8.4) onward, ensure that the saved data does not include any graphic handles. If it does, then the all the graphics data associated with this handle will be stored, and this could amount to many MB of data even for simple graphic objects. In R2014a and earlier, the graphic system used numeric handles, so storing them only meant storing a single number, but in MATLAB’s new graphics system (HG2) these handles became class objects that are being stored with all their associated data. This bloats the file and increases the time to save and load the file. | 2015-11-18 |