cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Venkatesh Kandaswamy <>
Subject Re: InputCQLPageRowSize seems to be behaving differently (or I am doing something wrong)
Date Tue, 30 Jun 2015 03:40:59 GMT
I was going through the WordCount example in the latest 2.1.7 Apache C*
source and there is a reference to
org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat, but it is not in
the source tree or in the compiled binary. Looks like we really cannot use
C* with Hadoop without a paging input format. Is there a reason why this
was removed? But the example includes it. I am confused. Please shed some
light if you know the answer.


Venky Kandaswamy

On 6/29/15, 1:15 PM, "Venkatesh Kandaswamy" <> wrote:

>   I converted one of my C* programs to Hadoop 2.x and C* datastax
>drivers for 2.1.0. The original program (Hadoop 1.x) worked fine when we
>specified InputCQLPageRowSize and InputSplitSize to reasonable values.
>For example, if we had 60K rows, a row size of 100 and split size of
>10000 will run 6 mappers and give us 60K rows. When we switched to 2.1.x
>version of the datastax drivers, the same program now gives only 600 rows.
> It looks like the paging logic has changed and the page size is only
>getting the first 100 rows. How do we get all the rows?
>Venky Kandaswamy

View raw message