I'm looking at the ConfigHelper.setRangeBatchSize() and
CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if
that's what I need and if yes, which one should I use for those purposes.
If you are using CQL 3 via Hadoop CqlConfigHelper.setInputCQLPageRowSize is the one you want. 

it maps to the LIMIT clause of the select statement the input reader will generate, the default is 1,000.

A
 
-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/09/2013, at 9:04 AM, Jiaan Zeng <l.allen09@gmail.com> wrote:

Speaking of thrift client, i.e. ColumnFamilyInputFormat, yes,
ConfigHelper.setRangeBatchSize() can reduce the number of rows sent to
Cassandra.

Depend on how big your column is, you may also want to increase thrift
message length through setThriftMaxMessageLengthInMb().

Hope that helps.

On Tue, Sep 10, 2013 at 8:18 PM, Renat Gilfanov <grennat@mail.ru> wrote:
Hi,

We have Hadoop jobs that read data from our Cassandra column families and
write some data back to another column families.
The input column families are pretty simple CQL3 tables without wide rows.
In Hadoop jobs we set up corresponding WHERE clause in
ConfigHelper.setInputWhereClauses(...), so we don't process the whole table
at once.
Never  the less, sometimes the amount of data returned by input query is big
enough to cause TimedOutExceptions.

To mitigate this, I'd like to configure Hadoop job in a such way that it
sequentially fetches input rows by smaller portions.

I'm looking at the ConfigHelper.setRangeBatchSize() and
CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if
that's what I need and if yes, which one should I use for those purposes.

Any help is appreciated.

Hadoop version is 1.1.2, Cassandra version is 1.2.8.



--
Regards,
Jiaan