cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renat Gilfanov <>
Subject Cassandra input paging for Hadoop
Date Wed, 11 Sep 2013 00:18:02 GMT

We have Hadoop jobs that read data from our Cassandra column families and write some data
back to another column families.
The input column families are pretty simple CQL3 tables without wide rows.
In Hadoop jobs we set up corresponding WHERE clause in ConfigHelper.setInputWhereClauses(...),
so we don't process the whole table at once. 
NeverĀ  the less, sometimes the amount of data returned by input query is bigĀ  enough to
cause TimedOutExceptions.

To mitigate this, I'd like to configure Hadoop job in a such way that it sequentially fetches
input rows by smaller portions.

I'm looking at the ConfigHelper.setRangeBatchSize() and CqlConfigHelper.setInputCQLPageRowSize()
methods, but a bit confused if that's what I need and if yes, which one should I use for those

Any help is appreciated.

Hadoop version is 1.1.2, Cassandra version is 1.2.8.
View raw message