incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renat Gilfanov <gren...@mail.ru>
Subject Re[2]: Cassandra input paging for Hadoop
Date Thu, 12 Sep 2013 05:58:09 GMT
 Hello,

So it means that job will process only first "cassandra.input.page.row.size" rows, and ignore
the rest? Or CqlPagingRecordReader supports paging through the entire result set?


  Aaron Morton <aaron@thelastpickle.com>:
>>>
>>>I'm looking at the ConfigHelper.setRangeBatchSize() and
>>>CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if
>>>that's what I need and if yes, which one should I use for those purposes. If you
are using CQL 3 via Hadoop CqlConfigHelper.setInputCQLPageRowSize is the one you want. 
>
>it maps to the LIMIT clause of the select statement the input reader will generate, the
default is 1,000.
>
>A
> 
>-----------------
>Aaron Morton
>New Zealand
>@aaronmorton
>
>Co-Founder & Principal Consultant
>Apache Cassandra Consulting
>http://www.thelastpickle.com
>
>On 12/09/2013, at 9:04 AM, Jiaan Zeng < l.allen09@gmail.com > wrote:
>>Speaking of thrift client, i.e. ColumnFamilyInputFormat, yes,
>>ConfigHelper.setRangeBatchSize() can reduce the number of rows sent to
>>Cassandra.
>>
>>Depend on how big your column is, you may also want to increase thrift
>>message length through setThriftMaxMessageLengthInMb().
>>
>>Hope that helps.
>>
>>On Tue, Sep 10, 2013 at 8:18 PM, Renat Gilfanov < grennat@mail.ru > wrote:
>>>Hi,
>>>
>>>We have Hadoop jobs that read data from our Cassandra column families and
>>>write some data back to another column families.
>>>The input column families are pretty simple CQL3 tables without wide rows.
>>>In Hadoop jobs we set up corresponding WHERE clause in
>>>ConfigHelper.setInputWhereClauses(...), so we don't process the whole table
>>>at once.
>>>Never  the less, sometimes the amount of data returned by input query is big
>>>enough to cause TimedOutExceptions.
>>>
>>>To mitigate this, I'd like to configure Hadoop job in a such way that it
>>>sequentially fetches input rows by smaller portions.
>>>
>>>I'm looking at the ConfigHelper.setRangeBatchSize() and
>>>CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if
>>>that's what I need and if yes, which one should I use for those purposes.
>>>
>>>Any help is appreciated.
>>>
>>>Hadoop version is 1.1.2, Cassandra version is 1.2.8.
>>
>>
>>
>>-- 
>>Regards,
>>Jiaan
>
Mime
View raw message