Hello Siddarth

I just throw an eye over the architecture diagram. The idea of using multiple threads, one for each token range is great. It help maxing out parallelism.

With https://issues.apache.org/jira/browse/CASSANDRA-11521 it would be even faster.

On Mon, Oct 3, 2016 at 7:51 PM, siddharth verma <sidd.verma29.list@gmail.com> wrote:
Hi,
I was working on a utility which can be used for cassandra full table scan, at a tremendously high velocity, cassandra fast full table scan.
How fast?
The script dumped ~ 229 million rows in 116 seconds, with a cluster of size 6 nodes.
Data transfer rates were upto 25MBps was observed on cassandra nodes.

For some use case, a spark cluster was required, but for some reason we couldn't create spark cluster. Hence, one may use this utility to iterate through the entire table at very high speed.

But now for any full scan, I use it freely for my adhoc java programs to manipulate or aggregate cassandra data.

You can customize the options, setting fetch size, consistency level, degree of parallelism(number of threads) according to your need.

You can visit https://github.com/siddv29/cfs to go through the code, see the logic behind it, or try it in your program.
A sample program is also provided.

I coded this utility in java.

Bhuvan Rawal(bhu1rawal@gmail.com) and I worked on this concept. 
For python you may visit his blog(http://casualreflections.io/tech/cassandra/python/Multiprocess-Producer-Cassandra-Python) and github(https://gist.github.com/bhuvanrawal/93c5ae6cdd020de47e0981d36d2c0785)

Looking forward to your suggestions and comments.

P.S. Give it a try. Trust me, the iteration speed is awesome!!
It is a bare application, built asap. If you would like to contribute to the java utility, add or build up on it, do reach out sidd.verma29.lists@gmail.com

Thanks and Regards,
Siddharth Verma
(previous email id on this mailing list : verma.siddharth@snapdeal.com)