Single file Python apps

2010-02-02 20:41:00 +0100

Various applications exist to ease software redistribution and packaging, for example:
klik
cpan
easyinstall
gems

Yet, redistribution is even easier when the application is platform-independent (scripts) and when the application is directly executable.

A naive approach for Python software is to concatenate all the dependent scripts into a single scripts. In the case of Waf, the tools are actually plugins which are not meant to be loaded all at once. Also, the resulting script would have a pretty huge size.

The best idea so far is to create an archive (in the tar file format) and to encode it as a base64 string hidden into the final script. The script, containing only a few routines for decompressing the archive would decode the string and unpack the library into a hidden folder when executed. All the program logic would then follow from the library files uncompressed.

The base64 encoding is safe in the sense that any binary string as input will be transformed in a string containing only letters and a few symbols (64 in total). Yet this operation increases the file size by about 33%.

The ascii85 encoding (used in pdf and postscript files) produces less safe symbols such as quotes and backslashes (85 characters in total), with a better cost (25%) and only a few more lines of python code.

The best of all encodings is to avoid having to encode the binary stream at all, but the python interpreter would not accept such files. The cPython interpreter at least ignores all characters located in comments between # and the following newline (\n and \r), which enables us to store binary data in commented lines. By using this system, the size increase for the binary data is about 2%.

The system for the waf coding is therefore:
1. make a compressed archive of all the files and obtain a binary string
2. find suitable 2-character escape sequences for the newline characters by scanning the binary string
3. replace the forbidden characters by the escape sequences
4. store the escape sequences, and write the binary string in a commented line

Upon execution, the library will be unpacked by following these steps:
1. open the script being executed, and read the binary string
2. replace the escape sequences by the newline characters
3. unpack the files from the binary string to obtain the library

In the case of Jython, the interpreter is a bit different, and will first read the whole file before parsing the Python code (the parser used by Jython seems to require that). Jython will then validate the characters present in the comment sections and throw a syntax error when trying to execute the Waf file.

Fortunately, changing the encoding declaration from 'utf-8' to 'iso8859-1' solved the problem entirely. While quite a few character sequences are forbidden in 'utf-8', binary data makes no such problem in 'iso8859-1'.

The last problem in Jython is the lack of bzip2 support. For now to obtain a compatible waf script, it is necessary to build Waf by using the gzip compression algorithm:
./waf-light --make-waf --zip-type=gz

Although the waf directory is only for preparing the Waf files, replacing the contents of the folder wafadmin and changing the name of the final application may help building and redistributing other Python applications as well.

Eclipse password recovery (cvs, subversion, ..)

2009-07-13 16:43:00 +0200

Eclipse stores all the passwords under a file named .keyring, and it makes somewhat difficult to recover them.

Or at least, it only seems difficult to do so. Using a plugin to enable beanshell code execution such as Eclipse-shell, the information may be recovered easily in a few lines of code:

import org.eclipse.core.internal.runtime.auth.AuthorizationDatabase;
import java.lang.reflect.Field;
import java.io.File;

File file = new File("c:/Eclipse/configuration/org.eclipse.core.runtime/.keyring");
String path = file.getAbsolutePath();
AuthorizationDatabase aut = new AuthorizationDatabase(path, "");
Field f = aut.getClass().getDeclaredField("authorizationInfo");
f.setAccessible(true);
f.get(aut);

Waf on jython 2.5 - conclusion

2009-07-01 00:14:00 +0200

Executing Waf on Jython is possible, but complicated and slow

After discarding many possible causes (threading, garbage collection), the problem has finally been located in the Jython subprocess execution

Since os.fork might allocate too much memory, Jython uses ProcessBuilder for executing external programs (new in Java 1.5).

Update: Java itself does not seem to be slow at launching processes:

~/tmp/pb > javac Fu.java
~/tmp/pb > time java Fu
1.63user 3.87system 0:05.89elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+64outputs (1major+567624minor)pagefaults 0swaps


The code of Fu.java is the following:


import java.io.*;
import java.util.*;
public class Fu {
public static void main(String args[]) throws IOException {
for (int i = 0; i < 1000; ++i) {
ProcessBuilder p = new ProcessBuilder("ls", "-l", "/tmp");
p.start();
}
}
}

Waf on Jython 2.5

2009-06-19 22:46:00 +0200

Jython 2.5 has been released a few days ago, and since Waf somehow supports it, I wanted to have a closer look. Jython is an implementation of the Python programming language for the Java virtual machine.

The supposed benefits of Jython are:
  • A different threading model able to use multicore processors
  • The use of the heavily optimized Java virtual machine
  • Using waf for web applications (Maven) where people usually do not have cPython
Unfortunately, Jython seems to also use the java primitives for parsing the scripts, and executing the waf script will not work. The main Waf script contains a compressed data island in a comment section to ease redistribution. When cPython rightfully skips the comments (except \r characters), Jython has to validate all unicode characters.

For now the solution to execute waf on Jython is to use the waf directory and the waf-light script:
~/jython/bin/jython waf-light

A quick benchmark can be created by using the script genbench.py located in the waf distribution:
utils/genbench.py /tmp/build 50 100 15 5

When executed, the benchmark will compile 5000 c++ files and link them by calling operating systems processes. The test was executed on a laptop having more than enough RAM and a dual core processor.

The cPython command was:
waf distclean configure build clean build clean build -p -j5
The average time of execution was 2m15 (std dev: 5 secs)

The Jython command was:
~/jython/bin/jython waf-light distclean configure build clean build clean build -p -j5
The average time of execution was 5m15 (std dev: 10 secs)

From a developer perspective, the java virtual machine and the jython start-up (7 seconds) slows down the edit-compile-run cyle, which is quite un pleasant (on cpython, waf starts instantly).

I will have a look at the profiles to understand the source of such a difference between cPython and Jython. For now Jython is not an interesting option for Waf users, and it probably will not make other applications such as Scons faster any time soon either.