cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <>
Subject Re: An extremely fast cassandra table full scan utility
Date Mon, 03 Oct 2016 18:10:56 GMT
Hello Siddarth

I just throw an eye over the architecture diagram. The idea of using
multiple threads, one for each token range is great. It help maxing out

With it would be even

On Mon, Oct 3, 2016 at 7:51 PM, siddharth verma <
> wrote:

> Hi,
> I was working on a utility which can be used for cassandra full table
> scan, at a tremendously high velocity, cassandra fast full table scan.
> How fast?
> The script dumped ~ 229 million rows in 116 seconds, with a cluster of
> size 6 nodes.
> Data transfer rates were upto 25MBps was observed on cassandra nodes.
> For some use case, a spark cluster was required, but for some reason we
> couldn't create spark cluster. Hence, one may use this utility to iterate
> through the entire table at very high speed.
> But now for any full scan, I use it freely for my adhoc java programs to
> manipulate or aggregate cassandra data.
> You can customize the options, setting fetch size, consistency level,
> degree of parallelism(number of threads) according to your need.
> You can visit to go through the code, see
> the logic behind it, or try it in your program.
> A sample program is also provided.
> I coded this utility in java.
> Bhuvan Rawal( and I worked on this concept.
> For python you may visit his blog(http://casualreflections.
> io/tech/cassandra/python/Multiprocess-Producer-Cassandra-Python) and
> github(
> 85)
> Looking forward to your suggestions and comments.
> P.S. Give it a try. Trust me, the iteration speed is awesome!!
> It is a bare application, built asap. If you would like to contribute to
> the java utility, add or build up on it, do reach out
> Thanks and Regards,
> Siddharth Verma
> (previous email id on this mailing list :

View raw message