incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Reading all rows in a column family in parallel
Date Thu, 08 Jul 2010 14:10:20 GMT
"CFRR does this.  Is this possible?"

I guess I don't understand the question. :)

On Thu, Jul 8, 2010 at 2:21 AM, Brent N. Chun <bnc@nutanix.com> wrote:
> Hello,
>
> I'm running Cassandra 0.6.0 on a cluster and have an application that needs
> to read all rows from a column family using the Cassandra Thrift API.
> Ideally, I'd like to be able to do this by having all nodes in the cluster
> read in parallel (i.e., each node reads a disjoint set of rows that are
> stored locally). I should also mention that I'm using the RandomPartitioner.
>
> Here's what I was thinking:
>
>  1. Have one node invoke describe_ring to find the token range on the ring
> that each node is responsible for.
>
>  2. For each token range, have the node that owns that portion of the ring
> read the rows in that range using a sequence of get_range_slices calls
> (using start/end tokens, not keys).
>
> This type of functionality seems to already be there in the tree with the
> recent Cassandra/Hadoop integration.
>
> ...
> KeyRange keyRange = new KeyRange(batchRowCount)
>        .setStart_token(startToken)
>        .setEnd_token(split.getEndToken());
> try
> {
>    rows = client.get_range_slices(new ColumnParent(cfName),
>           predicate,
>           keyRange,
>           ConsistencyLevel.ONE);
>     ...
>
>    // prepare for the next slice to be read
>    KeySlice lastRow = rows.get(rows.size() - 1);
>    IPartitioner p = DatabaseDescriptor.getPartitioner();
>    byte[] rowkey = lastRow.getKey();
>    startToken = p.getTokenFactory().toString(p.getToken(rowkey));
> ...
>
> The above snippet from ColumnFamilyRecordReader.java seems to suggest it is
> possible to scan an entire column family by reading disjoint sets of rows
> using token-based range queries (as opposed to key-based range queries). Is
> this possible in 0.6.0? (Note: for the next startToken, I was just planning
> on computing the MD5 digest of the last key directly since I'm accessing
> Cassandra through Thrift.)
>
> Thoughts?
>
> bnc
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message