incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John R. Frank" <...@mit.edu>
Subject loading all rows from cassandra using multiple (python) clients in parallel
Date Mon, 22 Apr 2013 12:15:59 GMT
Cassandra Experts,

I understand that when using Cassandra's recommended RandomPartitioner (or 
Murmur3Partitioner), it is not possible to do meaningful range queries on 
keys, because the rows are distributed around the cluster using the md5 
hash of the key.  These hashes are called "tokens."

Nonetheless, it would be very useful to split up a large table amongst 
many compute workers by assigning each a range of tokens.  Using CQL3, it 
appears possible to issue queries directly against the tokens, however the 
following python does not work:

http://stackoverflow.com/questions/16137944/loading-all-rows-from-cassandra-using-multiple-python-clients-in-parallel

I would ideally like to make this work with pycassa, because I prefer its 
more pythonic interface.

Am I just not invoking CQL3 correctly through the cql package?

Is there a better way to do this?


Thanks for any pointers!

John





Mime
View raw message