Performance optimization of the modified quadtree algorithm

For e-profile we need a really fast algorithm to place a bunch of windbarbs according to a given zoom level.

We discussed this issue here and here and here.

The algorithm for E-Profile is a slightly modified quadtree algorithm, described here in detail. Update: The algorithm failed the goal. So a new algorithm was developed.

To get a decent performance you have to squeeze any ounce of performance out of the code.

So we test all our optimizations against 1 Milllion datapoints and our target is one second runtime.

    Python   Cython    
  Number points OOP Flat numpy Only semantic One Class OOP Proc w. OOP wrapper
Runtime (ms) 100000 21335 9500 997 200 115
  1000000 274988 109000 11780 2900 1380
             
BaseMemory (MB) 100000 103.2 80.9 84.8 81.1 84.6
  1000000 326.8 95.1 98.6 95.1 99.2
             
Memory total 100000 231.9 102.2 124.9 183.4 100.9
  1000000 1618.5 314.6 511.6 1134.9 272.9

 

The base case is the Python OOP Code. It take 21 seconds for 100000 datapoints, far to slow for our need.

An alternative implementation in Numpy utilizing numpy Integers in favour to python Pointers reduces the memory consumption tremendously. Also the runtime is roughly halved or tripeled.

Ich we take the base case code and do just semantic changes in Cython notation via a pxd file we will gain ten times the speed of the numpy solution. Cool!

Then we can optimize the cython code even more. We combine the three classes in the base case into one class and use for this class a Cython cdef class we speed up another factor of five. Really impressive.

But the fastest and most memory efficient solution is the write the algorithm in plain procedural Cython utilizing a Struct to represent the quadtree structure. In addition we use a Cython cdef class to present the cython procedures as member of a Python class. And with this optimization we arrive at our optimization goal: 1 Mio datapoints processed in 1.4 seconds.