We have Hadoop jobs that read data from our Cassandra column families and write some data back to another column families.
The input column families are pretty simple CQL3 tables without wide rows.
In Hadoop jobs we set up corresponding WHERE clause in ConfigHelper.setInputWhereClauses(...), so we don't process the whole table at once.
Never the less, sometimes the amount of data returned by input query is big enough to cause TimedOutExceptions.
To mitigate this, I'd like to configure Hadoop job in a such way that it sequentially fetches input rows by smaller portions.
I'm looking at the ConfigHelper.setRangeBatchSize() and CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if that's what I need and if yes, which one should I use for those purposes.
Any help is appreciated.
Hadoop version is 1.1.2, Cassandra version is 1.2.8.