incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiaan Zeng <l.alle...@gmail.com>
Subject Re: Cassandra input paging for Hadoop
Date Wed, 11 Sep 2013 21:04:18 GMT
Speaking of thrift client, i.e. ColumnFamilyInputFormat, yes,
ConfigHelper.setRangeBatchSize() can reduce the number of rows sent to
Cassandra.

Depend on how big your column is, you may also want to increase thrift
message length through setThriftMaxMessageLengthInMb().

Hope that helps.

On Tue, Sep 10, 2013 at 8:18 PM, Renat Gilfanov <grennat@mail.ru> wrote:
> Hi,
>
> We have Hadoop jobs that read data from our Cassandra column families and
> write some data back to another column families.
> The input column families are pretty simple CQL3 tables without wide rows.
> In Hadoop jobs we set up corresponding WHERE clause in
> ConfigHelper.setInputWhereClauses(...), so we don't process the whole table
> at once.
> Never  the less, sometimes the amount of data returned by input query is big
> enough to cause TimedOutExceptions.
>
> To mitigate this, I'd like to configure Hadoop job in a such way that it
> sequentially fetches input rows by smaller portions.
>
> I'm looking at the ConfigHelper.setRangeBatchSize() and
> CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if
> that's what I need and if yes, which one should I use for those purposes.
>
> Any help is appreciated.
>
> Hadoop version is 1.1.2, Cassandra version is 1.2.8.



-- 
Regards,
Jiaan

Mime
View raw message