Performance of the ucollect master & GIL
It seems the DB is able to keep up with quite significant throughput. However, the ucollect master isn't. The problem is the GIL. The main thread would be able to keep up with a high number of clients. But if we create enough background threads to feed to the database, they clobber the performance down to crawling and the clients start dropping. If we don't, we don't use the performance of the DB and the queue of data to submit grows.
There are two possible courses to take:
- Rewrite to some less performance-terrible language than python, with real support for threads.
- Split it up into multiple processes working side by side. There are some issues here (global state of clients ‒ it can't be connected to multiple servers and needs to be somehow synchronized, shared timeouts), but most of the code should just work. Following approaches would be possible:
- Have a master process that the slaves connect to and synchronize through that.
- Have a clever proxy (maybe a replacement of soxy) that'd understand the protocol a bit and send the clients to the same backend at all times.
Edited by Vojtech Myslivec