Faster partial builds: 1M tasks in 10 seconds
Incremental builds in large projects can sometimes have some unnecessary overhead. This are typically mitigated by splitting up the project into several components. Other common approaches tend to minimize the amount of task objects to process by creating task batches or by running unity builds. Such optimizations are typically implemented as Waf extensions because they tend to add assumptions that may exclude useful scenarios.
Creating such optimizations is typically done in an iterative fashion, starting from a profile, making a few assumptions and then implementing a solution. The purpose of this post is to illustrate this process on a very large project simulation on the process of improving incremental build times. As a disclaimer, it must be mentioned that the simulated project does not correspond to a real-world case and was only meant for illustration purposes.
A Large test project
A project generator in the Waf source tree can be used to generate a C++ projects. In this case there will be 2000 static libraries using 500 classes (one source file and one header per class). Each source file randomly includes 15 headers from the same library and 5 headers from other libraries of the project.
This project thus contains 1 million mostly empty C++ files
(with 1 million headers) for a total of 2,000 static libraries.
The full build time takes 55 minutes (this can be reduced to 20 minutes by using nobuild
), and an incremental build takes about 7 minutes after changing one file.
Let us view the profile for the incremental build by using the --profile
switch:
Nothing really stands out at this point, aside from the massive amount of tasks to process (1M).
Assumptions
We will now list a few assumptions that apply to this project and try to derive some optimizations:
- There are lots of targets with link tasks and inter-dependent libraries:
- Object-only targets can be ignored for the time being.
- Each task generator holds a fraction of the tasks from the total.
- The project phase is simple:
- Full project builds without
--targets
or pruning from sub-folders - The installation phase is unnecessary
- Full project builds without
- Task generators are statically defined:
- task generator source files are not obtained from globs
use=
dependencies are fully described in user scripts
- The project is large so it is unlikely to located over a network file system, therefore file modification changes (timestamps) are reliable.
One simple idea is to avoid creating all task objects when performing incremental builds by creating a map of tasks to the dependency files and using file timestamps to detect changes very quickly. Subsequent builds can then detect which files have changed and apply the relevant task generators, getting through the whole build much faster.
Adding a Waf extension
We do not really want to modify the Waf implementation at this point, so we
will instead add all the code for our experiments to a Waf extension.
The file fast_partial.py
is created in the directory of the project,
and then included in the wscript
file by trying to load the extension options:
No error is raised if the extension does not provide any command-line options.
Any class defined in the extension will be loaded when the build starts,
so that no project re-configuration is necessary.
The sections below will now focus on the implementation in fast_partial.py
.
Mapping files to task generators
The first step is to create a build context subclass to create or update this mapping. Since the store method is called after a build is finished, we override it and have it call a new store_timestamps method to create or update our new mapping.
To apply the mapping, the method get_build_iterator is then overridden to call a new compute_needed_tgs method which will apply the filter. Finally, a new is_stale method is bound to task generators to detect when tasks really need to be created:
With these changes, the runtime already shows a nice improvement. There is a noticeable gap between an incremental build processing one file, and an incremental build with all files up-to-date (no-op).
We will now revisit the profile data for the incremental build version:
The next issue is that the main data file still needs to be loaded in memory when a build starts (322MB):
Splitting the main data file
One obvious approach is to have one of such data file per task generator instead of a global one for the whole build. To do this, we remember that each task generator is bound to a build context, and then implement add a build context object for each task generator. This is achieved by adding a new build context proxy class:
Proxy objects reference a global build context and will in general have only few properties set using __setattr__. They may return attributes from the global context in most cases through the usage of __getattribute__.
Next, the proxy re-implements the store and restore methods so that build data gets actually read and written for each target. Since the build cache serializes Node objects, it is also necessary for the proxy to provide its own Node class and avoid Python serialization errors.
Finally, a new instance of the proxy is created whenever a new task generator is instantiated. The __call__ method of our main build context is overridden for this effect:
Since timestamps are being used, it becomes tempting to leverage the md5_tstamp extension that hashes files only when a change is detected. This will not work out of the box because of the new proxy, so a custom implementation is necessary, this can be achieved by replacing the default method through monkey-patching:
With this scheme, incremental builds are substantially reduced:
The full build time remains essentially the same, though memory consumption increases substantially (4GB to 9GB). The fully build also tends to start processing the first tasks a little faster.
Conclusion
This post has shown the application of a few optimizations to improve the incremental build time of a large simulated project after reviewing profile data and making assumptions about the build structure. The incremental build time went down from 7 minutes to 10 seconds.
The current implementation can be found in the extras folder of the Waf 2.0 source tree. A few assumptions above such as limiting the usage of globs can be worked around, but this is deemed unnecessary for the purposes of this article.