incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: Cassandra input paging for Hadoop
Date Thu, 12 Sep 2013 02:23:01 GMT
>> 
>> I'm looking at the ConfigHelper.setRangeBatchSize() and
>> CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if
>> that's what I need and if yes, which one should I use for those purposes.
If you are using CQL 3 via Hadoop CqlConfigHelper.setInputCQLPageRowSize is the one you want.


it maps to the LIMIT clause of the select statement the input reader will generate, the default
is 1,000.

A
 
-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/09/2013, at 9:04 AM, Jiaan Zeng <l.allen09@gmail.com> wrote:

> Speaking of thrift client, i.e. ColumnFamilyInputFormat, yes,
> ConfigHelper.setRangeBatchSize() can reduce the number of rows sent to
> Cassandra.
> 
> Depend on how big your column is, you may also want to increase thrift
> message length through setThriftMaxMessageLengthInMb().
> 
> Hope that helps.
> 
> On Tue, Sep 10, 2013 at 8:18 PM, Renat Gilfanov <grennat@mail.ru> wrote:
>> Hi,
>> 
>> We have Hadoop jobs that read data from our Cassandra column families and
>> write some data back to another column families.
>> The input column families are pretty simple CQL3 tables without wide rows.
>> In Hadoop jobs we set up corresponding WHERE clause in
>> ConfigHelper.setInputWhereClauses(...), so we don't process the whole table
>> at once.
>> Never  the less, sometimes the amount of data returned by input query is big
>> enough to cause TimedOutExceptions.
>> 
>> To mitigate this, I'd like to configure Hadoop job in a such way that it
>> sequentially fetches input rows by smaller portions.
>> 
>> I'm looking at the ConfigHelper.setRangeBatchSize() and
>> CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if
>> that's what I need and if yes, which one should I use for those purposes.
>> 
>> Any help is appreciated.
>> 
>> Hadoop version is 1.1.2, Cassandra version is 1.2.8.
> 
> 
> 
> -- 
> Regards,
> Jiaan


Mime
View raw message