cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew E. Kennedy" <>
Subject Re: Pig not reading all cassandra data
Date Wed, 02 Feb 2011 21:34:54 GMT

I noticed in the jobtracker log that when the pig job kicks off, I get the following info

2011-02-02 09:13:07,269 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201101241634_0193
= 0. Number of splits = 1

So I looked at the job.split file that is created for the Pig job and compared it to the job.split
file created for the map-reduce job.  The map reduce file contains an entry for each split,
whereas the  job.split file for the Pig job contains just the one split.

I added some code to the ColumnFamilyInputFormat to output what it thinks it sees as it should
be creating input splits for the pig jobs, and the call to getSplits() appears to be returning
the correct list of splits.  I can't figure out where it goes wrong though when the splits
should be written to the job.split file.

Does anybody know the specific class responsible for creating that file in a Pig job, and
why it might be affected by using the pig CassandraStorage module?

Is anyone else successfully running Pig jobs against a 0.7 cluster?

View raw message