cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Valle (BLOOMBERG/ LONDON)" <>
Subject Re:How to speed up SELECT * query in Cassandra
Date Wed, 11 Feb 2015 10:40:07 GMT
Look for the message "Re: Fastest way to map/parallel read all values in a table?" in the mailing
list, it was recently discussed. You can have several parallel processes each one reading
a slice of the data, by splitting min/max murmur3 hash ranges.

In the company I used to work we developed a system to run custom python processes on demand
to process Cassandra data among other things to be able to do that. I hope it will be released
as open source soon, it seems there is a lot of people having always this same problem.

If you use Cassandra enterprise, you can use hive, AFAIK. A good idea would be running a hadoop
or spark process over your cluster and do the processing you want, but sometimes I think it
might be a bit hard to achieve good results for that, mainly because these tools work fine
but are "auto magic". It's hard to control where intermediate data will be stored, for example.

Subject: Re:How to speed up SELECT * query in Cassandra

Is there a simple way (or even a complicated one) how can I speed up SELECT * FROM [table]
I need to get all rows form one table every day. I split tables, and create one for each day,
but still query is quite slow (200 millions of records)

I was thinking about run this query in parallel, but I don't know if it is possible

View raw message