---------- Forwarded message ----------
From: "Mark Lewandowski" <mark.e.lewandowski@gmail.com>
Date: Jun 8, 2013 8:03 AM
Subject: Cassandra (1.2.5) + Pig (0.11.1) Errors with large column families
To: <user@cassandra.apache.org>
Cc:

> I'm cur.rently trying to get Cassandra (1.2.5) and Pig (0.11.1) to play nice together.  I'm running a basic script:
>
> rows = LOAD 'cassandra://keyspace/colfam' USING CassandraStorage();
> dump rows;
>
> This fails for my column family which has ~100,000 rows.  However, if I modify the script to this:
>
> rows = LOAD 'cassandra://betable_games/bets' USING CassandraStorage();
> rows = limit rows 7000;
> dump rows;
>
> Then it seems to work.  7000 is about as high as I've been able to get it before it fails.  The error I keep getting is:
>
> 2013-06-07 14:58:49,119 [Thread-4] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
> java.lang.RuntimeException: org.apache.thrift.TException: Message length exceeded: 4480
> at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
> at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
> at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
> at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.getProgress(PigRecordReader.java:169)
> at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:514)
> at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:539)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
> Caused by: org.apache.thrift.TException: Message length exceeded: 4480
> at org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)
> at org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
> at org.apache.cassandra.thrift.Column.read(Column.java:535)
> at org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507)
> at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408)
> at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12905)
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
> at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734)
> at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718)
> at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346)
> ... 13 more
>
>
> I've seen a similar problem on this mailing list using Cassandra-1.2.3, however the fixes on that thread of increasing thrift_framed_transport_size_in_mb, thrift_max_message_length_in_mb in cassandra.yaml did not appear to have any effect.  Has anyone else seen this issue, and how can I fix it?
>
> Thanks,
>
> -Mark