Performance optimization of the modified quadtree algorithm
For e-profile we need a really fast algorithm to place a bunch of windbarbs according to a given zoom level.
We discussed this issue here and here and here.
The algorithm for E-Profile is a slightly modified quadtree algorithm, described here in detail. Update: The algorithm failed the goal. So a new algorithm was developed.
To get a decent performance you have to squeeze any ounce of performance out of the code.
So we test all our optimizations against 1 Milllion datapoints and our target is one second runtime.
Python | Cython | |||||
Number points | OOP | Flat numpy | Only semantic | One Class OOP | Proc w. OOP wrapper | |
Runtime (ms) | 100000 | 21335 | 9500 | 997 | 200 | 115 |
1000000 | 274988 | 109000 | 11780 | 2900 | 1380 | |
BaseMemory (MB) | 100000 | 103.2 | 80.9 | 84.8 | 81.1 | 84.6 |
1000000 | 326.8 | 95.1 | 98.6 | 95.1 | 99.2 | |
Memory total | 100000 | 231.9 | 102.2 | 124.9 | 183.4 | 100.9 |
1000000 | 1618.5 | 314.6 | 511.6 | 1134.9 | 272.9 |
The base case is the Python OOP Code. It take 21 seconds for 100000 datapoints, far to slow for our need.
An alternative implementation in Numpy utilizing numpy Integers in favour to python Pointers reduces the memory consumption tremendously. Also the runtime is roughly halved or tripeled.
Ich we take the base case code and do just semantic changes in Cython notation via a pxd file we will gain ten times the speed of the numpy solution. Cool!
Then we can optimize the cython code even more. We combine the three classes in the base case into one class and use for this class a Cython cdef class we speed up another factor of five. Really impressive.
But the fastest and most memory efficient solution is the write the algorithm in plain procedural Cython utilizing a Struct to represent the quadtree structure. In addition we use a Cython cdef class to present the cython procedures as member of a Python class. And with this optimization we arrive at our optimization goal: 1 Mio datapoints processed in 1.4 seconds.