incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Pig not reading all cassandra data
Date Sat, 05 Feb 2011 06:01:51 GMT
On Fri, Feb 4, 2011 at 9:47 PM, Matt Kennedy <stinkymatt@gmail.com> wrote:
> Found the culprit.  There is a new feature in Pig 0.8 that will try to
> reduce the number of splits used to speed up the whole job.  Since the
> ColumnFamilyInputFormat lists the input size as zero, this feature
> eliminates all of the splits except for one.
>
> The workaround is to disable this feature for jobs that use CassandraStorage
> by setting -Dpig.splitCombination=false in the pig_cassandra script.
>
> Hope somebody finds this useful, you wouldn't believe how many dead-ends I
> ran down trying to figure this out.

Ouch, thanks for tracking that down.

What should CFIF be returning differently?  Do you mean the
InputSplit.getLength?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message