Rust version has caught up to the C one

It took a while, but the Rust version of the code generating the positions of galaxies has finally reached the level of functionality of the C version. Meanwhile, I have gathered quite a bit of interesting experience, which I'm now going to share.

Rust vs other languages

Programming in Rust is nothing like programming in any other language I've had contact with (which means mainly the C family and Python). One remotely similar experience was experimenting with Haskell, but even that generally due to incompatibility between my intuition and the language (although Rust has many functional features, but as will be mentioned later, one shouldn't overuse them...).

The feature that differentiates Rust from other languages the most is its concept of ownership. Every value has a binding that owns it. When the owner goes out of scope, the memory it was using is automatically deallocated. Since every value can only have one owner and each binding goes out of scope sooner or later, problems like memory leaks or double deallocation disappear.

In order to be able to refer to a value from more than one place, one can "borrow" it. You can borrow a value immutably multiple times (but, as the name suggests, you won't be able to make any changes to it), or only once mutably. The purpose is to make it impossible for multithreaded programs to simultaneously modify the same data (or even to read data which is being modified by another thread).

The rules are simple enough, but it turns out that many structures known from other languages become difficult or even impossible to code.

Cyclic data structures in Rust

In so-called "safe" Rust (which means observing the rules described above - you can turn them off for some fragments of the code) cyclic data structures mainly don't exist. You can see that in a simple example:
Let A refer to B. We want to save a reference to A in B (for example, because we want a doubly linked list or a tree traversible both up and down). Because A already has a reference to B, we can't mutably borrow B - it would contradict the rules of borrowing. The only way would be to simultaneously create A and B, but there is no such possibility in Rust.

Actually, the situation is not really so hopeless. There are data structures allowing to enforce borrowing rules dynamically at runtime. Using them, one can temporarily obtain a mutable reference to an object already borrowed in another way. I don't really understand this topic, so I won't be going into more detail.

Octrees

One of the data structures I'm using in the Universe project is an octree. I use octrees to subdivide the space and decide where the generated objects are supposed to be. So, I wanted to code octrees in Rust.

At the very beginning I encountered the problems I described above - I wasn't able to keep a reference to the parent in child nodes. It would appear that I lost the ability to go up in the tree, which would mean not being able to focus on bigger fragments of space after focusing on smaller ones. Fortunately, I remembered one thing after a while.

When I was experimenting with Haskell, I read about coding trees in this language. In the article there was also a description of a structure enabling one to traverse a tree - a zipper. A zipper contains two main elements - the node we are focused on and a list of parent nodes along with side branches. Going down in the tree means separating the child node and pushing the parent to the end of the list, and going up - taking a parent from the list and rebuilding the node.

So I decided to code a zipper in Rust and allow myself to travel freely around the octree. My intuition failed me again and I had to change my approach multiple times, until finally I settled on something resembling the functional style - the operations on the tree and the zippers created modified copies of the structures and returned them, leaving original ones untouched. This way I didn't have to worry about borrowing rules (since you can borrow immutably without limits).

When the octree was finished, the rest was much easier. After a relatively short time I could start testing and comparing results to the C version.

The efficiency of the functional style

I wrote a short test progam - it defined the seed for the universe, its size, generated a root node (containing the number of all galaxies) and asked the tree for the list of galaxies in a small block.

First execution... the result was displayed after a few seconds. That's not good. The C version did the same in the blink of an eye. Slightly depressed that my work has gone to waste, I started looking for the cause.

I launched valgrind - a profiler. I had my suspicions which turned out to be correct - the side effect of using the functional styles was millions, if not billions, of allocations and deallocations. Each operation on the tree meant copying all the nodes, which contained various additional data. Everything together was taking a long time.

A friend of mine has confirmed my suspicions, so I started modifying the code in such a way that structures would be mutated in place instead of being copied. After a hard fight with the compiler, which was complaining a lot about me illegally borrowing values, I managed to do it. The moment of truth - execute... 0,5 seconds.

A lot better, but that's still quite a bit behind C, which was executing in 0,1 seconds. Something was still wrong. Then I realized something.

I use multiple-precision numbers from the GMP library a lot in the program. In order to use it from Rust, I needed a wrapper: rust-gmp. This wrapper defines for example the mathematical operations, so that I can write "c = a + b" instead of "__gmpz_add(c, a, b)". Well, those operations were creating a new number object each time they were executed, so that they could return it as the result. This is ok when the arguments have to remain unmodified and are only borrowed, but sometimes they are only some intermediate result and I can "consume" them freely. Then, instead of allocating new memory for the result, I can reuse that occupied by one of the arguments.

When I modified the wrapper library accordingly and rewrote the mathematical expressions to use this optimization as much as possible, I ran the test again. 0,25 seconds. Closer.

I launched valgrind again. Now the place in the code taking the most time was a function which approximates the integral of the density of objects. Looking closer, I realized that every call to this function created 8 new objects representing points in space, each of which contained 3 GMP numbers. This meant 24 allocations and deallocations of memory. What's more, this function is being called at each subdivision of a node in the octree, so 8 times... And the nodes are being subdivided many, many times... The number of operations was growing a lot as a result of this.

The solution was simple - I changed the density function to accept references to existing points, instead of needing copies, and... 0,13 seconds. Almost at the C level :)

What's next?

This is the moment I stopped at. The possibilities of easy optimization are exhausted, but everything seems pretty good. What's really optimistic is that parallel processing is supposedly easy in Rust, so with a bit of luck I'll be able to take advantage of that and accelerate the code even more. I have 8 cores in my computer, which means that the time of execution can even fall below 0,02 seconds. It's actually quite possible that this will outperform C :)

Among other things, what's left is the further development of the project. The code generating galaxies in given space is ready, now I need to divide those galaxies into smaller parts and generate stars. Then I'll also be able to think about some visualisation, so that it will be finally possible to see the results.

The test application

Below are two links to the test application. It allows you to enter a seed for the universe (which determines its properties) and coordinates of two corners of a block. The program then computes the coordinates of the galaxies contained in this block and displays them.

Note: The Linux version requires the GMP and MPFR libraries to be installed in the system.

Download “Test Universe (Win32)” test-universe.zip – Downloaded 983 times – 1.28 MB
Download “Test Universe (Linux amd64)” test_universe-linux.tar.gz – Downloaded 901 times – 275.83 KB