cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Venkatesh Kandaswamy <ve...@walmartlabs.com>
Subject Re: InputCQLPageRowSize seems to be behaving differently (or I am doing something wrong)
Date Tue, 30 Jun 2015 03:40:59 GMT
I was going through the WordCount example in the latest 2.1.7 Apache C*
source and there is a reference to
org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat, but it is not in
the source tree or in the compiled binary. Looks like we really cannot use
C* with Hadoop without a paging input format. Is there a reason why this
was removed? But the example includes it. I am confused. Please shed some
light if you know the answer.

‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹

Venky Kandaswamy
925-200-7124





On 6/29/15, 1:15 PM, "Venkatesh Kandaswamy" <venky@walmartlabs.com> wrote:

>All,
>   I converted one of my C* programs to Hadoop 2.x and C* datastax
>drivers for 2.1.0. The original program (Hadoop 1.x) worked fine when we
>specified InputCQLPageRowSize and InputSplitSize to reasonable values.
>For example, if we had 60K rows, a row size of 100 and split size of
>10000 will run 6 mappers and give us 60K rows. When we switched to 2.1.x
>version of the datastax drivers, the same program now gives only 600 rows.
>
> It looks like the paging logic has changed and the page size is only
>getting the first 100 rows. How do we get all the rows?
>
>‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹
>[cid:E4089CAC-450F-40E4-8A26-88A74F209FC9]
>Venky Kandaswamy
>925-200-7124
>


Mime
View raw message