Generating the structure of the universe

The structure of space

When creating a universe, you have to begin somewhere. A good starting point is to define its shape.

The most convenient and probably the most obvious shape is a cube. Each of the three coordinates is then a number from the same range and it is very easy to make it so that there are no boundaries - it is enough to add a condition that leaving the cube on one side is equivalent to entering it on the other side. It is similar to the well-known Snake game, where the snake leaving the screen on the right returned from the left - just in three dimensions.

So we have a cubic space made of points described by three numbers: x,\; y,\; z\; \in (a,b). A question arises: what data type will be best for representing the coordinates of a point? The answer requires us to check the size of the numbers involved first.

We would like a universe of a realistic size - which gives the side of the cube on the order of 10^{11} light years. A year is about 30 000 000 (3 \times 10^7) seconds, a light second is about 300 000 000 (3 \times 10^8) meters, which gives the size on the order of 10^{27} meters. It would be nice, if the elementary parts of the universe were smaller than a meter (let's say, a millimeter) - which means that we have to handle numbers with 30 significant digits. That's a lot.

For such large numbers floating-point data types are often used, but they will not be right in this case. Because such numbers store a constant number of significant digits and the position of the decimal point, their accuracy depends on their size. For example, a 64-bit floating-point number can store 15 significant digits - which means that for numbers on the order of a billion (10^9) they have an accuracy of about one millionth (10^{-6}), but for numbers around 10^{15} the result is only accurate up to a unit.

It is clear that some library for multiple precision numbers will be needed. I decided to use GMP and describe the coordinates with integers (which, with appropriately placed decimal point, can be interpreted as fixed-point real numbers). Why not floating-point? Even with precision appropriate for the largest numbers, the accuracy would vary across space. I prefer to have constant accuracy and not risk discrepancies between different places, even if it means having lower accuracy near 0. Of course in this case the coordinates won't mean the number of meters from the origin, but rather the number of some arbitrary units. I defined the meter to be 65536 units, which allows me to treat the last 16 bits of a number as digits after the decimal point.

Generating the galaxies

Having defined the space, we can start to consider the galaxies.

The module generating the galaxies will have to be able to answer two main questions:

  • What are the positions of galaxies in a given fragment of space?
  • Are the given coordinates of a galaxy correct?

The second question will be mostly for validating some external data, but it will also be useful.

The solution to the first question looks simple enough - it should suffice to take the given range of coordinates, generate some pseudorandom numbers from this range, starting from some predetermined seed and that's it. But what if we take two intersecting blocks of space? How to make sure that we will get the same galaxies in their common part? What will help is octrees. The algorithm will look like this:

  1. Take a cube corresponding to the whole space and generate a number of galaxies contained in it (how - it's a separate question, it might for example be a pseudorandom number generated from the seed for the universe)
  2. Generate a seed for the current cube (for example, by hashing its coordinates)
  3. Divide the cube in 8 parts (dividing each side in two)
  4. For each part, assign to it some probability proportional to some galaxy density function (for a uniform distibution - just 1/8)
  5. For each cube, generate the number of galaxies in it:
    • Calculate the success probability p, dividing probability for the cube by the sum of probabilities for it and the remaining cubes
    • Generate a number from the binomial distribution with parameters - N = the number of remaining galaxies, p = the probability generated in the previous step
    • Subtract the resulting number of galaxies from the total and repeat for other cubes
  6. If any of the smaller blocks has sides of length 1 and contains only 1 galaxy, return it as the position of a galaxy.
  7. Repeat steps starting from 2 for the cubes having nonzero number of galaxies and nonempty intersection with the given fragment of space

Since in this algorithm we always divide by 2, it will be convenient to have a power of 2 as the length of the side of the universe. The size of the universe will thus be described by the number n, meaning the side of 2^{n-16} meters (reminder: 1 meter is 65536 = 2^{16} units).

Because we always start from the whole universe and progress to smaller and smaller parts, we will always get the same galaxies in the same blocks, no matter what the given fragment was. The only important thing is that the content of a given block be determined only by its own properties - but this can easily be achieved by using a value connected to it for a seed (for example, a hash of the coordinates). The algorithm may seem slow, because it always requires dividing the universe in 2 all the way to the lowest level, but even for realistic universes it gives about 100 divisions until the blocks have size of 1 unit. The only problem might appear when the given fragment is too big and the number of galaxies in it is very large, but this can be avoided by not generating galaxies in large blocks of space.

It is worth noting, that the same algorithm can be used for other objects, like stars - we will only need a proper density function. This is a thing for the future, though.

Implementation

For now the algorithm described above is implemented in C. I've discovered a new interesting language though, namely Rust. Since it doesn't allow memory leaks and generates code with performance similar to C/C++, I decided to rewrite the algorithm in it. I'm currently working on a library of Rust bindings to MPFR (it's needed for the binomial distribution).

The next post will appear when the galaxy generation in Rust is ready :)