Powered by SourceForge   Home | About | Partners | Contact Us

XLR with LLVM now closing the performance gap with C

The recent factoring on the XLR version of XL was intended to make it easier to maintain, but also to significantly accelerate it. Results are in, and they are amazing. XLR now handily beats unoptimized C code, being within 15% of optimized C code.

Up until now, XLR was using LLVM only to manipulate XL data structures, i.e. instances of the class Tree or subclasses. This meant that when you wrote 1+2, some runtime function would get a pointer to two Integer objects with values 1 and 2, create a new Integer(3) object, and return that. Obviously, this had a significant runtime cost, in particular requiring a garbage collector to reclaim all the dynamically allocated objects generated even for the simplest computation.

fib 0 -> 1
fib 1 -> 1
fib N -> (fib (N-1) + fib (N-2))
fib 35

The Fibonacci code being tested

The recent refactoring work was intended to help close the gap. The primary objective was to identify types in a program precisely enough that it allowed LLVM to do its job effectively. This refactoring work is now mostly done. Here are some encouraging numbers:

  • To compute the 35th element of the Fibonacci suite, the old implementation took approximately 8s of CPU time on my machine.
  • To compute the same result, the new implementation only takes 0.117 s. Finer-tuned measurements showed that it runs about 80 to 90 times faster than an opt build of the old implementation.
  • For the 45th element of the Fibonacci sequence, XLR now takes 6.45s, whereas optimized C code takes 5.69s (compiled with gcc), less than 15% faster. Unoptimized C code runs in 17.03s, so XLR handily beats unoptimized C...

So here we are: for simple code, XLR is now within 15% of optimized C code! The remaining difference is probably due to how LLVM optimizes JIT-ed code. Some fine tuning might help, but I won't even bother for the remaining performance tweaks. Someone more versed in fine-tuning LLVM may make some suggestions. Also, this was with ToT of LLVM today, which is not necessarily release quality.

The optimization tests are here if you want to try this out for yourself. Prepare for some crashes if you venture out of what I tested :-)

The work is far from being done, but this is really encouraging.

Type inference: the secret sauce

(x:integer + y:integer):integer -> opcode Add
foo A -> A + A + A + 5 + 12
foo 3

Source code testing additions

define internal i64 @xl_eval_foo(i64) {
allocas:
%1 = mul i64 %0, 3
%2 = add i64 %1, 17
ret i64 %2
}

Code generated for LLVM

These results are interesting because they are the result of a rather big infrastructure being finally implemented in XLR: type inference. This work is presently only available on the dyncompile branch of XLR.

In the Fibonacci example, XLR can figure out that fib is a function that only takes integer arguments and returns integer arguments. As a result, it tells LLVM only about regular machine integers, not about Integer class instances. No runtime costs, only machine instructions.

For example, the functions being generated for this code look like this for LLVM. Notice how LLVM transformed the multiple additions into a "multiply" instruction. In the generate code, it's even better, as the multiply by 3 and addition of a constant are transformed into a single lea instruction on x86.