cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Kotelnikov <>
Subject Full table scan with cassandra
Date Wed, 16 Aug 2017 16:50:48 GMT

we are trying Cassandra as an alternative for storage huge stream of data
coming from our customers.

Storing works quite fine, and I started to validate how retrieval does. We
have two types of that: fetching specific records and bulk retrieval for
general analysis.
Fetching single record works like charm. But it is not so with bulk fetch.

With a moderately small table of ~2 million records, ~10Gb raw data I
observed very slow operation (using token(partition key) ranges). It takes
minutes to perform full retrieval. We tried a couple of configurations
using virtual machines, real hardware and overall looks like it is not
possible to all table data in a reasonable time (by reasonable I mean that
since we have 1Gbit network 10Gb can be transferred in a couple of minutes
from one server to another and when we have 10+ cassandra servers and 10+
spark executors total time should be even smaller).

I tried datastax spark connector. Also I wrote a simple test case using
datastax java driver and see how fetch of 10k records takes ~10s so I
assume that "sequential" scan will take 200x more time, equals ~30 minutes.

May be we are totally wrong trying to use Cassandra this way?


Best Regards,

*Alexander Kotelnikov*

*Team Lead*

Retail Technology Company

m: +7.921.915.06.28

* <>*

View raw message