Faster partial builds: 1M tasks in 10 seconds

2017-10-15 18:05:01 +0200

Incremental builds in large projects can sometimes have some unnecessary overhead. This are typically mitigated by splitting up the project into several components. Other common approaches tend to minimize the amount of task objects to process by creating task batches or by running unity builds. Such optimizations are typically implemented as Waf extensions because they tend to add assumptions that may exclude useful scenarios.

Creating such optimizations is typically done in an iterative fashion, starting from a profile, making a few assumptions and then implementing a solution. The purpose of this post is to illustrate this process on a very large project simulation on the process of improving incremental build times. As a disclaimer, it must be mentioned that the simulated project does not correspond to a real-world case and was only meant for illustration purposes.

A Large test project

A project generator in the Waf source tree can be used to generate a C++ projects. In this case there will be 2000 static libraries using 500 classes (one source file and one header per class). Each source file randomly includes 15 headers from the same library and 5 headers from other libraries of the project.

$ ./utils/genbench.py /temp/build 2000 500 15 5

This project thus contains 1 million mostly empty C++ files (with 1 million headers) for a total of 2,000 static libraries. The full build time takes 55 minutes (this can be reduced to 20 minutes by using nobuild ), and an incremental build takes about 7 minutes after changing one file. Let us view the profile for the incremental build by using the --profile switch:

$ waf --profile
'build' finished successfully (7m21.910s)
Sat Oct 14 16:25:35 2017    profi.txt

   328869198 function calls (324860660 primitive calls) in 441.939 seconds

   Ordered by: internal time
   List reduced from 538 to 75 due to restriction <75>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  6000011  215.459    0.000  215.459    0.000 {method 'read' of 'file' objects}
  1         15.830   15.830   16.458   16.458 {cPickle.loads}
  3000015   13.877    0.000   13.877    0.000 {open}
 41558744   12.361    0.000   16.030    0.000 waf/waflib/Task.py:233(priority)
 22967867   12.203    0.000  264.451    0.000 waf/waflib/Node.py:891(get_bld_sig)
 32971867   11.432    0.000   11.432    0.000 {method 'update' of '_hashlib.HASH' objects}
  3000000   10.460    0.000  247.576    0.000 waf/waflib/Utils.py:250(h_file)
 20779372    9.328    0.000   25.358    0.000 waf/waflib/Task.py:192(__lt__)
  4006076    8.443    0.000    8.443    0.000 {posix.stat}
  1000000    8.331    0.000   77.843    0.000 waf/waflib/Task.py:790(compute_sig_implicit_deps)
  3010003    7.597    0.000    8.411    0.000 waf/waflib/Node.py:378(make_node)

Nothing really stands out at this point, aside from the massive amount of tasks to process (1M).

Assumptions

We will now list a few assumptions that apply to this project and try to derive some optimizations:

  • There are lots of targets with link tasks and inter-dependent libraries:
    • Object-only targets can be ignored for the time being.
    • Each task generator holds a fraction of the tasks from the total.
  • The project phase is simple:
    • Full project builds without --targets or pruning from sub-folders
    • The installation phase is unnecessary
  • Task generators are statically defined:
    • task generator source files are not obtained from globs
    • use= dependencies are fully described in user scripts
  • The project is large so it is unlikely to located over a network file system, therefore file modification changes (timestamps) are reliable.

One simple idea is to avoid creating all task objects when performing incremental builds by creating a map of tasks to the dependency files and using file timestamps to detect changes very quickly. Subsequent builds can then detect which files have changed and apply the relevant task generators, getting through the whole build much faster.

Adding a Waf extension

We do not really want to modify the Waf implementation at this point, so we will instead add all the code for our experiments to a Waf extension. The file fast_partial.py is created in the directory of the project, and then included in the wscript file by trying to load the extension options:

#! /usr/bin/env python
# encoding: utf-8

top  = '.'
out  = 'out'

def options(opt):
  opt.load('compiler_cxx')
  opt.load('fast_partial', tooldir='.')

def configure(conf):
  ...

No error is raised if the extension does not provide any command-line options. Any class defined in the extension will be loaded when the build starts, so that no project re-configuration is necessary. The sections below will now focus on the implementation in fast_partial.py.

Mapping files to task generators

The first step is to create a build context subclass to create or update this mapping. Since the store method is called after a build is finished, we override it and have it call a new store_timestamps method to create or update our new mapping.

class bld(Build.BuildContext):
  def store(self):
    pass
  def store_tstamps(self):
    pass

  def get_build_iterator(self):
    pass
  def compute_needed_tgs(self):
    pass

To apply the mapping, the method get_build_iterator is then overridden to call a new compute_needed_tgs method which will apply the filter. Finally, a new is_stale method is bound to task generators to detect when tasks really need to be created:

@taskgen_method
def is_stale(self):
  pass

With these changes, the runtime already shows a nice improvement. There is a noticeable gap between an incremental build processing one file, and an incremental build with all files up-to-date (no-op).

$ echo " " >> lib_512/class_3.cpp ; waf; waf
Waf: Entering directory `/temp/build/out'
[189/501] Compiling lib_512/class_3.cpp
Waf: Leaving directory `/temp/build/out'
'build' finished successfully (1m16.846s)

Waf: Entering directory `/temp/build/out'
Waf: Leaving directory `/temp/build/out'
'build' finished successfully (32.900s)

We will now revisit the profile data for the incremental build version:

'build' finished successfully (1m20.945s)
Sun Oct 15 14:59:15 2017    profi.txt

   23311106 function calls (23305549 primitive calls) in 80.967 seconds

   Ordered by: internal time
   List reduced from 591 to 75 due to restriction <75>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  2   43.460   21.730   44.590   22.295 {cPickle.dumps}
  2   22.246   11.123   22.876   11.438 {cPickle.loads}
  7956254    4.277    0.000    7.390    0.000 fast_partial.py:270(tstamp)
  3011352    3.128    0.000    3.128    0.000 {posix.stat}
     2000    2.269    0.001   10.683    0.005 fast_partial.py:228(is_stale)
     2002    1.002    0.001    1.002    0.001 {zip}

The next issue is that the main data file still needs to be loaded in memory when a build starts (322MB):

$ du -sm /temp/build/out/.wafpickle
322     /temp/build/out/.wafpickle-linux2-34016752-20
358     /temp/build/out/.wafpickle_tstamp_db_file

Splitting the main data file

One obvious approach is to have one of such data file per task generator instead of a global one for the whole build. To do this, we remember that each task generator is bound to a build context, and then implement add a build context object for each task generator. This is achieved by adding a new build context proxy class:

class bld_proxy(object):
  def __init__(self, bld):
    object.__setattr__(self, 'bld', bld)
    ...

  def __setattr__(self, name, value):
    bld = object.__getattribute__(self, 'bld')
    setattr(bld, name, value)

  def __delattr__(self, name):
    bld = object.__getattribute__(self, 'bld')
    delattr(bld, name)

  def __getattribute__(self, name):
    try:
      return object.__getattribute__(self, name)
    except AttributeError:
      bld = object.__getattribute__(self, 'bld')
      return getattr(bld, name)

  ...

Proxy objects reference a global build context and will in general have only few properties set using __setattr__. They may return attributes from the global context in most cases through the usage of __getattribute__.

Next, the proxy re-implements the store and restore methods so that build data gets actually read and written for each target. Since the build cache serializes Node objects, it is also necessary for the proxy to provide its own Node class and avoid Python serialization errors.

Finally, a new instance of the proxy is created whenever a new task generator is instantiated. The __call__ method of our main build context is overridden for this effect:

class bld(Build.BuildContext):
  def __call__(self, *k, **kw):
    bld = kw['bld'] = bld_proxy(self)
    ret = TaskGen.task_gen(*k, **kw)
    self.task_gen_cache_names = {}
    self.add_to_group(ret, group=kw.get('group'))
    ret.bld = bld

Since timestamps are being used, it becomes tempting to leverage the md5_tstamp extension that hashes files only when a change is detected. This will not work out of the box because of the new proxy, so a custom implementation is necessary, this can be achieved by replacing the default method through monkey-patching:

def h_file(filename):
  ...
waflib.Node.Node.h_file = h_file

With this scheme, incremental builds are substantially reduced:

$ echo " " >> lib_512/class_3.cpp; waf; waf
Waf: Entering directory `/temp/build/out'
[189/501] Compiling lib_512/class_3.cpp
Waf: Leaving directory `/temp/build/out'
'build' finished successfully (9.515s)

Waf: Entering directory `/temp/build/out'
Waf: Leaving directory `/temp/build/out'
'build' finished successfully (6.693s)

The full build time remains essentially the same, though memory consumption increases substantially (4GB to 9GB). The fully build also tends to start processing the first tasks a little faster.

Conclusion

This post has shown the application of a few optimizations to improve the incremental build time of a large simulated project after reviewing profile data and making assumptions about the build structure. The incremental build time went down from 7 minutes to 10 seconds.

The current implementation can be found in the extras folder of the Waf 2.0 source tree. A few assumptions above such as limiting the usage of globs can be worked around, but this is deemed unnecessary for the purposes of this article.

How to verify LibreSSL release files

2016-10-02 08:05:01 +0200

Like many other sites, waf.io is now using LibreSSL in order to support the s as secure in https. Since LibreSSL is still a relatively new project, there are only few Linux distribution packages, so we are building it ourselves from the source code ourselves. Using OpenBSD directly might become an option for us once SNI is finally supported in HTTPD and binary patches make system updates easier (maybe in OpenBSD 6.1).

Although obtaining the source code is easy, both LibreSSL project and download pages are over http only, so the data can be tampered with during transfer. Do not LibreSSL developers trust their own library sufficiently to dare to use it? Where are the download verification instructions?

The release files are at least signed with GPG, but the corresponding public key is only present over http. That key does not seem to be used to sign the project commits either. While signing commits does make a signing key valid, it increases the amount of data to tamper with, and thus increases the confidence in the usage of the signing key and in the state of the source tree. This is the reason why most commits in the Waf source tree are signed with the release key for instance:

$ git log --show-signature
commit 9ed7d41488a88935e1f6f5fccbce6397a8ac1fed
gpg: Signature made Thu 15 Sep 2016 09:36:02 PM CEST
gpg:                using RSA key 0x49B4C67C05277AAA
gpg: Good signature from "Thomas Nagy <noreply@waf.io>" [ultimate]
Primary key fingerprint: 8AF2 2DE5 A068 22E3 474F  3C70 49B4 C67C 0527 7AAA
Author: Thomas Nagy <noreply@waf.io>
Date:   Thu Sep 15 21:36:02 2016 +0200

    Expand '--foo=' with shell=False - Issue #1814

Fortunately for LibreSSL, their release files are also co-signed with the new Signify tool from the OpenBSD project, and the public key is at least also present in the project source tree on Github and can therefore be obtained over https at least (it would be best if the GPG public key were also signed using Signify as other release files by the way).

The Signify manual page is way shorter than the GPG one, yet it fails to describe the expected file formats and the underlying algorithms that would make one want to trust it. It is also a pity that the newly-invented signature format prevents usage of signatures as Python/Ruby/Perl commented lines, that keeping signatures in archive files is not supported (jarsigner for Jar files), and that chains of trust to verify files is inexistent (certificates).

The Signify application is also a bit difficult to find on Linux distributions as there is another unrelated application named signify that generates random email signatures, but on Debian the package signify-openbsd can be installed directly. In the end I deemed the public key (RWQg/nutTVqCUVUw8OhyHt9n51IC8mdQRd1b93dOyVrwtIXmMI+dtGFe) sufficiently trustworthy and ran the following commands to fetch and verify the latest LibreSSL release:

$ sudo apt-get install signify-openbsd
$ wget http://ftp.openbsd.org/pub/OpenBSD/LibreSSL/libressl-2.5.0.tar.gz
$ wget http://ftp.openbsd.org/pub/OpenBSD/LibreSSL/SHA256.sig
$ wget https://raw.githubusercontent.com/libressl-portable/portable/master/libressl.pub
$ signify-openbsd -C -p libressl.pub -x SHA256.sig libressl-2.5.0.tar.gz
Signature Verified
libressl-2.5.0.tar.gz: OK

Let us hope that the download and verification instructions will be easier to follow in the future.

Techniques for stripping binaries

2016-09-26 08:05:01 +0200

Programs and shared libraries can contain symbol information that increase the size of the data to redistribute and that facilitates reverse-engineering. The linkers/compilers do not provide a way of stripping symbols while creating the files, so in practice the strip program needs to be executed on the resulting binary files. The build process must then place barriers to prevent usage of such binaries before they are ready.

A fairly common approach consists in running strip only on files that are installed. Overriding the method copy_fun on the installation class provides a coarse-grained way of achieving this. In the following example, the strip command is called on files comes from a link task:

import shutil, os
from waflib import Build, Utils, Context

def copy_fun(self, src, tgt):
    shutil.copy2(src, tgt)
    os.chmod(tgt, self.chmod)

    if getattr(self.generator, 'link_task', None):
        if self.generator.link_task.outputs[0] in self.inputs:
            self.generator.bld.cmd_and_log('strip %s' % tgt, quiet=Context.BOTH)
Build.inst.copy_fun = copy_fun

If stripping is required during the build phase, then the build order must be set so that dependent tasks wait for the binaries to be ready. A reliable implementation may be difficult to achieve for partial builds if the strip operation is modeled as a Task object as the task does not know the full list of downstream dependencies beforehand. The following hack hack provides a rather easy way to force a particular order though. In the following example, both inputs and outputs for the strip task are set to the same file:

from waflib import TaskGen

@TaskGen.feature('cshlib', 'cxxshlib', 'cprogram', 'cxxprogram')
@TaskGen.after('apply_link')
def add_strip_task(self):
    if getattr(self, 'link_task', None):
        exe_node = self.link_task.outputs[0]
        # special case: same inputs and outputs for a task!
        strip_task = self.create_task('strip', exe_node, exe_node)

The main drawback of this solution is that a deadlock will be observed if several post-processing operations are declared for the same file. Besides that, the strip task object is not really necessary in the first place; removing it can significantly reduce the amount of console logs for a whole build.

My favorite approach consists in chaining the run method through inheritance. In the example below, the function wrap_compiled_task creates subclasses with the same name as the original. A Python metaclass bound to parent Task class translates the run_str attribute into a run method so that a long python function does not need to be written down. That metaclass also registers the last class created so that cls3 replaces cls1 in Task.classes[classname] as default. One must be careful to load C/C++/Fortran Waf tools first else the code will not find any class to subclass:

from waflib import Task

def wrap_compiled_task(classname):
    # create subclasses and override the method 'run'
    #
    cls1 = Task.classes[classname]
    cls2 = type(classname, (cls1,), {'run_str': '${STRIP} ${TGT[0].abspath()}'})
    cls3 = type(classname, (cls2,), {})

    def run_all(self):
        if self.env.NO_STRIPPING:
            return cls1.run(self)
        ret = cls1.run(self)
        if ret:
            return ret
        return cls2.run(self)
    cls3.run = run_all

for k in 'cprogram cshlib cxxprogram cxxshlib fcprogram fcshlib'.split():
    if k in Task.classes:
        wrap_compiled_task(k)

The techniques described above above are not exclusively for stripping binaries though. They may be precious in situations where built files need to be modified after a compiler was executed. The complete examples for this post can be found in the following folder.

How to build static libraries from shared objects

2016-09-25 18:05:01 +0200

It is sometimes necessary to build and redistribute static libraries that for future use in shared libraries. On Linux -fPIC is usually passed to the arguments of the compiler. The first attempts at writing a portable wscript file typically resemble the following:

def build(bld):
    bld.stlib(source='x.c', target='foo', use='cshlib')
    bld.program(source='main.c', target='app', use='foo')

The problem with this approach is that both -fPIC and -shared flags are then propagated to all downstream targets, with the annoying side effect of silently turning programs into shared libraries. Flag propagation can be easily stopped by replacing use by uselib:

def build(bld):
    bld.stlib(source='x.c', target='foo', uselib='cshlib')
    bld.program(source='main.c', target='app', use='foo')

While this works, the code above is more difficult to understand as the uselib keyword is less frequently used. The best approach may be to declare explicitly the flags instead. In this case it is sufficient to specify the cflags (cxxflags in C++ or fcflags in Fortran):

def build(bld):
    bld.stlib(source='x.c', target='foo', cflags=bld.env.CFLAGS_cshlib)
    bld.program(source='main.c', target='app', use='foo')

Five useful command-lines

2015-07-15 08:05:01 +0200

Command-line tools such as ls can almost have too many options. Fortunately, aggregates can be formed instead of passing flags one by one. For example, a long command such as ls -l -r -t can be simplified to ls -lrt. And as repeated letters usually lack side effects, amusing words can be obtained such as netstat -satan.

Here are five useful and easy-to-memorize command lines that have been used for the creation of waf.io:

diff -burN    # compare two files, ignoring whitespaces
du -sm *      # compute folder/file sizes, in megabytes
ls -lart      # display a detailed file listing, order by time
gpg -bass     # sign a file with gpg
netstat -evil # similar to ifconfig -a