Performance optimization of the modified quadtree algorithm

For e-profile we need a really fast algorithm to place a bunch of windbarbs according to a given zoom level.

We discussed this issue here and here and here.

The algorithm for E-Profile is a slightly modified quadtree algorithm, described here in detail. Update: The algorithm failed the goal. So a new algorithm was developed.

To get a decent performance you have to squeeze any ounce of performance out of the code.

So we test all our optimizations against 1 Milllion datapoints and our target is one second runtime.

		Python		Cython
	Number points	OOP	Flat numpy	Only semantic	One Class OOP	Proc w. OOP wrapper
Runtime (ms)	100000	21335	9500	997	200	115
	1000000	274988	109000	11780	2900	1380

BaseMemory (MB)	100000	103.2	80.9	84.8	81.1	84.6
	1000000	326.8	95.1	98.6	95.1	99.2

Memory total	100000	231.9	102.2	124.9	183.4	100.9
	1000000	1618.5	314.6	511.6	1134.9	272.9

The base case is the Python OOP Code. It take 21 seconds for 100000 datapoints, far to slow for our need.

An alternative implementation in Numpy utilizing numpy Integers in favour to python Pointers reduces the memory consumption tremendously. Also the runtime is roughly halved or tripeled.

Ich we take the base case code and do just semantic changes in Cython notation via a pxd file we will gain ten times the speed of the numpy solution. Cool!

Then we can optimize the cython code even more. We combine the three classes in the base case into one class and use for this class a Cython cdef class we speed up another factor of five. Really impressive.

But the fastest and most memory efficient solution is the write the algorithm in plain procedural Cython utilizing a Struct to represent the quadtree structure. In addition we use a Cython cdef class to present the cython procedures as member of a Python class. And with this optimization we arrive at our optimization goal: 1 Mio datapoints processed in 1.4 seconds.