I am working for client that needs to persist 100K-200K records per second for later querying. As a proof of concept, we are looking at several options including nosql (Cassandra and MongoDB).
I have been running some tests on my laptop (MacBook Pro, 4GB RAM, 2.66 GHz, Dual Core/4 logical cores) and have not been happy with the results.
The best I have been able to accomplish is 100K records in approximately 30 seconds. Each record has 30 columns, mostly made up of integers. I have tried both the Hector and Pelops APIs, and have tried writing in batches versus one at a time. The times have not varied much.
I am using the out of the box configuration for Cassandra, and while I know using 1 disk will have an impact on performance, I would expect to see better write numbers than I am.
As a point of reference, the same test using MongoDB I was able to accomplish 100K records in 3.5 seconds.
Any tips would be appreciated.