merge-data.py has outrageours memory requirements
Example:
- Shotgun output from 16 threads
- running for 25 hours
- individual JSONs produced by threads consume 5.8 GiB storage
- peak memory usage is 23.3 GiB when running
merge-data.py
I think we should rework the data format to use stream processing instead of loading all the data into memory.