Faster partial builds: 1M tasks in 10 seconds
Incremental builds in large projects can sometimes have some unnecessary overhead. This are typically mitigated by splitting up the project into several components. Other common approaches tend to minimize the amount of task objects to process by creating task batches or by running unity builds. Such optimizations are typically implemented as Waf extensions because they tend to add assumptions that may exclude useful scenarios.
Creating such optimizations is typically done in an iterative fashion, starting from a profile, making a few assumptions and then implementing a solution. The purpose of this post is to illustrate this process on a very large project simulation on the process of improving incremental build times. As a disclaimer, it must be mentioned that the simulated project does not correspond to a real-world case and was only meant for illustration purposes.
A Large test project
A project generator in the Waf source tree can be used to generate a C++ projects. In this case there will be 2000 static libraries using 500 classes (one source file and one header per class). Each source file randomly includes 15 headers from the same library and 5 headers from other libraries of the project.
This project thus contains 1 million mostly empty C++ files
(with 1 million headers) for a total of 2,000 static libraries.
The full build time takes 55 minutes (this can be reduced to 20 minutes by using nobuild
), and an incremental build takes about 7 minutes after changing one file.
Let us view the profile for the incremental build by using the --profile
switch:
Nothing really stands out at this point, aside from the massive amount of tasks to process (1M).
Assumptions
We will now list a few assumptions that apply to this project and try to derive some optimizations:
- There are lots of targets with link tasks and inter-dependent libraries:
- Object-only targets can be ignored for the time being.
- Each task generator holds a fraction of the tasks from the total.
- The project phase is simple:
- Full project builds without
--targets
or pruning from sub-folders - The installation phase is unnecessary
- Full project builds without
- Task generators are statically defined:
- task generator source files are not obtained from globs
use=
dependencies are fully described in user scripts
- The project is large so it is unlikely to located over a network file system, therefore file modification changes (timestamps) are reliable.
One simple idea is to avoid creating all task objects when performing incremental builds by creating a map of tasks to the dependency files and using file timestamps to detect changes very quickly. Subsequent builds can then detect which files have changed and apply the relevant task generators, getting through the whole build much faster.
Adding a Waf extension
We do not really want to modify the Waf implementation at this point, so we
will instead add all the code for our experiments to a Waf extension.
The file fast_partial.py
is created in the directory of the project,
and then included in the wscript
file by trying to load the extension options:
No error is raised if the extension does not provide any command-line options.
Any class defined in the extension will be loaded when the build starts,
so that no project re-configuration is necessary.
The sections below will now focus on the implementation in fast_partial.py
.
Mapping files to task generators
The first step is to create a build context subclass to create or update this mapping. Since the store method is called after a build is finished, we override it and have it call a new store_timestamps method to create or update our new mapping.
To apply the mapping, the method get_build_iterator is then overridden to call a new compute_needed_tgs method which will apply the filter. Finally, a new is_stale method is bound to task generators to detect when tasks really need to be created:
With these changes, the runtime already shows a nice improvement. There is a noticeable gap between an incremental build processing one file, and an incremental build with all files up-to-date (no-op).
We will now revisit the profile data for the incremental build version:
The next issue is that the main data file still needs to be loaded in memory when a build starts (322MB):
Splitting the main data file
One obvious approach is to have one of such data file per task generator instead of a global one for the whole build. To do this, we remember that each task generator is bound to a build context, and then implement add a build context object for each task generator. This is achieved by adding a new build context proxy class:
Proxy objects reference a global build context and will in general have only few properties set using __setattr__. They may return attributes from the global context in most cases through the usage of __getattribute__.
Next, the proxy re-implements the store and restore methods so that build data gets actually read and written for each target. Since the build cache serializes Node objects, it is also necessary for the proxy to provide its own Node class and avoid Python serialization errors.
Finally, a new instance of the proxy is created whenever a new task generator is instantiated. The __call__ method of our main build context is overridden for this effect:
Since timestamps are being used, it becomes tempting to leverage the md5_tstamp extension that hashes files only when a change is detected. This will not work out of the box because of the new proxy, so a custom implementation is necessary, this can be achieved by replacing the default method through monkey-patching:
With this scheme, incremental builds are substantially reduced:
The full build time remains essentially the same, though memory consumption increases substantially (4GB to 9GB). The fully build also tends to start processing the first tasks a little faster.
Conclusion
This post has shown the application of a few optimizations to improve the incremental build time of a large simulated project after reviewing profile data and making assumptions about the build structure. The incremental build time went down from 7 minutes to 10 seconds.
The current implementation can be found in the extras folder of the Waf 2.0 source tree. A few assumptions above such as limiting the usage of globs can be worked around, but this is deemed unnecessary for the purposes of this article.
How to verify LibreSSL release files
Like many other sites, waf.io is now using LibreSSL in order to support the s as secure in https. Since LibreSSL is still a relatively new project, there are only few Linux distribution packages, so we are building it ourselves from the source code ourselves. Using OpenBSD directly might become an option for us once SNI is finally supported in HTTPD and binary patches make system updates easier (maybe in OpenBSD 6.1).
Although obtaining the source code is easy, both LibreSSL project and download pages are over http only, so the data can be tampered with during transfer. Do not LibreSSL developers trust their own library sufficiently to dare to use it? Where are the download verification instructions?
The release files are at least signed with GPG, but the corresponding public key is only present over http. That key does not seem to be used to sign the project commits either. While signing commits does make a signing key valid, it increases the amount of data to tamper with, and thus increases the confidence in the usage of the signing key and in the state of the source tree. This is the reason why most commits in the Waf source tree are signed with the release key for instance:
Fortunately for LibreSSL, their release files are also co-signed with the new Signify tool from the OpenBSD project, and the public key is at least also present in the project source tree on Github and can therefore be obtained over https at least (it would be best if the GPG public key were also signed using Signify as other release files by the way).
The Signify manual page is way shorter than the GPG one, yet it fails to describe the expected file formats and the underlying algorithms that would make one want to trust it. It is also a pity that the newly-invented signature format prevents usage of signatures as Python/Ruby/Perl commented lines, that keeping signatures in archive files is not supported (jarsigner for Jar files), and that chains of trust to verify files is inexistent (certificates).
The Signify application is also a bit difficult to find on Linux distributions as there is another unrelated application named signify that generates random email signatures, but on Debian the package signify-openbsd
can be installed directly. In the end I deemed the public key (RWQg/nutTVqCUVUw8OhyHt9n51IC8mdQRd1b93dOyVrwtIXmMI+dtGFe) sufficiently trustworthy and ran the following commands to fetch and verify the latest LibreSSL release:
Let us hope that the download and verification instructions will be easier to follow in the future.
Techniques for stripping binaries
Programs and shared libraries can contain symbol information that increase the size of the data to redistribute and that facilitates reverse-engineering. The linkers/compilers do not provide a way of stripping symbols while creating the files, so in practice the strip program needs to be executed on the resulting binary files. The build process must then place barriers to prevent usage of such binaries before they are ready.
A fairly common approach consists in running strip only on files that are installed. Overriding the method copy_fun
on the installation class provides a coarse-grained way of achieving this. In the following example, the strip command is called on files comes from a link task:
If stripping is required during the build phase, then the build order must be set so that dependent tasks wait for the binaries to be ready. A reliable implementation may be difficult to achieve for partial builds if the strip operation is modeled as a Task object as the task does not know the full list of downstream dependencies beforehand. The following hack hack provides a rather easy way to force a particular order though. In the following example, both inputs and outputs for the strip task are set to the same file:
The main drawback of this solution is that a deadlock will be observed if several post-processing operations are declared for the same file. Besides that, the strip task object is not really necessary in the first place; removing it can significantly reduce the amount of console logs for a whole build.
My favorite approach consists in chaining the run method through inheritance. In the example below, the function wrap_compiled_task
creates subclasses with the same name as the original. A Python metaclass bound to parent Task class translates the run_str
attribute into a run
method so that a long python function does not need to be written down. That metaclass also registers the last class created so that cls3
replaces cls1
in Task.classes[classname]
as default. One must be careful to load C/C++/Fortran Waf tools first else the code will not find any class to subclass:
The techniques described above above are not exclusively for stripping binaries though. They may be precious in situations where built files need to be modified after a compiler was executed. The complete examples for this post can be found in the following folder.
How to build static libraries from shared objects
It is sometimes necessary to build and redistribute static libraries that for future use in shared libraries. On Linux -fPIC
is usually passed to the arguments of the compiler. The first attempts at writing a portable wscript file typically resemble the following:
The problem with this approach is that both -fPIC
and -shared
flags are then propagated to all downstream targets, with the annoying side effect of silently turning programs into shared libraries. Flag propagation can be easily stopped by replacing use
by uselib
:
While this works, the code above is more difficult to understand as the uselib
keyword is less frequently used. The best approach may be to declare explicitly the flags instead. In this case it is sufficient to specify the cflags
(cxxflags
in C++ or fcflags
in Fortran):
Five useful command-lines
Command-line tools such as ls
can almost have too many options. Fortunately, aggregates can be formed instead of passing flags one by one. For example, a long command such as ls -l -r -t
can be simplified to ls -lrt
. And as repeated letters usually lack side effects, amusing words can be obtained such as netstat -satan
.
Here are five useful and easy-to-memorize command lines that have been used for the creation of waf.io: