Using strace to obtain build dependencies
Build dependencies are usually tracked either manually (input files in the build scripts) or through a dependency generator (a dependency scanner function in the case of Waf, or calling the compiler to dump dependencies). Another possibility can be to obtain the complete list of files accessed by the compilers by observing the system through an application such as strace. Here is an example of a strace output:
436242 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
436242 stat("test.out", 0x7fffd0ab8b00) = -1 ENOENT
436242 stat("/etc/fstab", {st_mode=S_IFREG|0664, st_size=458, ...}) = 0
436242 stat("test.out", 0x7fffd0ab8890) = -1 ENOENT
436242 open("/etc/fstab", O_RDONLY) = 3
436242 open("test.out", O_WRONLY|O_CREAT|O_EXCL, 0664) = 4
436242 exit_group(0) = ?
436242 +++ exited with 0 +++
The strace approach is used by the Fabricate build tool which is somehow advertised to be a better build tool. While this sounds like a brilliant idea in theory, strace is only present on some operating systems and not even by default. Fabricate will just silently ignore dependencies that it cannot find. And while more tracing tools exists (truss, dtrace), they are likely to require some adaptations.
Waf aims to be a reliable and portable solution so it will not depend on such a technique by default. There are also many cases where manually-written dependency scanners are necessary, for example in Qt file processing. Yet, I decided to play with the idea to see what an implementation would look like. Writing an extension to disable scanners and call strace appeared very easy to implement, but then scraping the strace outputs appeared to be a more complicated topic.
And indeed, compilers such as gcc can change directory (chdir) and spawn new processes (for example /usr/bin/as). Then they can open files using relative or absolute paths, and not all attempts are successful. As a result, interpreting the strace outputs requires to keep track of the current working directory for each sub-process executed. The data can also grow quite large, so it is tricky to process efficiently. Iterating over output lines by means of a single regular expression seems to do the job at least.
The following trace extract shows the system calls of a shell script (pid 436292) changing its directory to '/etc' and calling command to copy files (pid 436293):
436292 execve("stracedeps/foo.sh", ..
436292 chdir("/etc") = 0
436292 stat("/sbin/cp", 0x7fff6e640c20) = -1 ENOENT (No such file or directory)
436292 stat("/bin/cp", {st_mode=S_IFREG|0755, st_size=150912, ...}) = 0
436292 clone(child_stack=0, ..) = 436293
436293 execve("/bin/cp", ["cp", "fstab", "../home/user/test.out"],
436293 open("fstab", O_RDONLY) = 3
436293 open("../home/user/test.out", O_WRONLY|O_CREAT|O_EXCL, 0664) = 4
436293 exit_group(0) = ?
436292 exit_group(0) = ?
Then, double-quotes characters in file names will be preceded by backslashes in the strace outputs, so that an additional transformation is needed. An adequate regexp to match a string containing double quotes is "([^"\\]|\\.)*": a string is a sequence of either any character except quotes/backslashes, or a character preceded by a backslash. I have actually reported an issue regarding this point to the Fabricate issue tracker regarding missing dependencies.
436323 open("waf/playground/stracedeps/\"hi\".txt", O_RDONLY) = 3
While the new Waf strace extension can return interesting results, the build performance takes a major hit. A C++ benchmark build taking 56s would then take 1m28s by running commands through strace, and 1m31s with complete dependency processing. I would try with dtrace on FreeBSD to see if the tracing can be faster, but it seems dtrace requires root privileges to run at this time.