incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lanny Ripple <la...@spotright.com>
Subject Re: Thrift message length exceeded
Date Wed, 17 Apr 2013 22:34:55 GMT
It's slow going finding the time to do so but I'm working on that.

We do have another table that has one or sometimes two columns per row.  We can run jobs on
it without issue.  I looked through org.apache.cassandra.hadoop code and don't see anything
that's really changed since 1.1.5 (which was also using thrift-0.7) so something of a puzzler
about what's going on.


On Apr 17, 2013, at 2:47 PM, aaron morton <aaron@thelastpickle.com> wrote:

> Can you reproduce this in a simple way ? 
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 18/04/2013, at 5:50 AM, Lanny Ripple <lanny@spotright.com> wrote:
> 
>> That was our first thought.  Using maven's dependency tree info we verified that
we're using the expected (cass 1.2.3) jars
>> 
>> $ mvn dependency:tree | grep thrift
>> [INFO] |  +- org.apache.thrift:libthrift:jar:0.7.0:compile
>> [INFO] |  \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile
>> 
>> I've also dumped the final command run by the hadoop we use (CDH3u5) and verified
it's not sneaking thrift in on us.
>> 
>> 
>> On Tue, Apr 16, 2013 at 4:36 PM, aaron morton <aaron@thelastpickle.com> wrote:
>> Can you confirm the you are using the same thrift version that ships 1.2.3 ? 
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 16/04/2013, at 10:17 AM, Lanny Ripple <lanny@spotright.com> wrote:
>> 
>>> A bump to say I found this
>>> 
>>>  http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded
>>> 
>>> so others are seeing similar behavior.
>>> 
>>> From what I can see of org.apache.cassandra.hadoop nothing has changed since
1.1.5 when we didn't see such things but sure looks like there's a bug that's slipped in (or
been uncovered) somewhere.  I'll try to narrow down to a dataset and code that can reproduce.
>>> 
>>> On Apr 10, 2013, at 6:29 PM, Lanny Ripple <lanny@spotright.com> wrote:
>>> 
>>>> We are using Astyanax in production but I cut back to just Hadoop and Cassandra
to confirm it's a Cassandra (or our use of Cassandra) problem.
>>>> 
>>>> We do have some extremely large rows but we went from everything working
with 1.1.5 to almost everything carping with 1.2.3.  Something has changed.  Perhaps we were
doing something wrong earlier that 1.2.3 exposed but surprises are never welcome in production.
>>>> 
>>>> On Apr 10, 2013, at 8:10 AM, <moshe.kranc@barclays.com> wrote:
>>>> 
>>>>> I also saw this when upgrading from C* 1.0 to 1.2.2, and from hector
0.6 to 0.8
>>>>> Turns out the Thrift message really was too long.
>>>>> The mystery to me: Why no complaints in previous versions? Were some
checks added in Thrift or Hector?
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Lanny Ripple [mailto:lanny@spotright.com] 
>>>>> Sent: Tuesday, April 09, 2013 6:17 PM
>>>>> To: user@cassandra.apache.org
>>>>> Subject: Thrift message length exceeded
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> We have recently upgraded to Cass 1.2.3 from Cass 1.1.5.  We ran sstableupgrades
and got the ring on its feet and we are now seeing a new issue.
>>>>> 
>>>>> When we run MapReduce jobs against practically any table we find the
following errors:
>>>>> 
>>>>> 2013-04-09 09:58:47,746 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
>>>>> 2013-04-09 09:58:47,899 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
>>>>> 2013-04-09 09:58:48,021 INFO org.apache.hadoop.util.ProcessTree: setsid
exited with exit code 0
>>>>> 2013-04-09 09:58:48,024 INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin
: org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a48edb5
>>>>> 2013-04-09 09:58:50,475 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>>>> 2013-04-09 09:58:50,477 WARN org.apache.hadoop.mapred.Child: Error running
child
>>>>> java.lang.RuntimeException: org.apache.thrift.TException: Message length
exceeded: 106
>>>>> 	at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
>>>>> 	at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
>>>>> 	at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
>>>>> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>>>>> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>>>>> 	at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
>>>>> 	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:444)
>>>>> 	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:460)
>>>>> 	at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>>>>> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
>>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:260)
>>>>> Caused by: org.apache.thrift.TException: Message length exceeded: 106
>>>>> 	at org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)
>>>>> 	at org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
>>>>> 	at org.apache.cassandra.thrift.Column.read(Column.java:528)
>>>>> 	at org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507)
>>>>> 	at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408)
>>>>> 	at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12905)
>>>>> 	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
>>>>> 	at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734)
>>>>> 	at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718)
>>>>> 	at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346)
>>>>> 	... 16 more
>>>>> 2013-04-09 09:58:50,481 INFO org.apache.hadoop.mapred.Task: Runnning
cleanup for the task
>>>>> 
>>>>> The message length listed on each failed job differs (not always 106).
 Jobs that used to run fine now fail with code compiled against cass 1.2.3 (and work fine
if compiled against 1.1.5 and run against the 1.2.3 servers in production).  I'm using the
following setup to configure the job:
>>>>> 
>>>>> def cassConfig(job: Job) {
>>>>>  val conf = job.getConfiguration()
>>>>> 
>>>>>  ConfigHelper.setInputRpcPort(conf, "" + 9160)
>>>>>  ConfigHelper.setInputInitialAddress(conf, Config.hostip)
>>>>> 
>>>>>  ConfigHelper.setInputPartitioner(conf, "org.apache.cassandra.dht.RandomPartitioner")
>>>>>  ConfigHelper.setInputColumnFamily(conf, Config.keyspace, Config.cfname)
>>>>> 
>>>>>  val pred = {
>>>>>    val range = new SliceRange()
>>>>>      .setStart("".getBytes("UTF-8"))
>>>>>      .setFinish("".getBytes("UTF-8"))
>>>>>      .setReversed(false)
>>>>>      .setCount(4096 * 1000)
>>>>> 
>>>>>    new SlicePredicate().setSlice_range(range)
>>>>>  }
>>>>> 
>>>>>  ConfigHelper.setInputSlicePredicate(conf, pred)
>>>>> }
>>>>> 
>>>>> The job consists only of a mapper that increments counters for each row
and associated columns so all I'm really doing is exercising ColumnFamilyRecordReader.
>>>>> 
>>>>> Has anyone else seen this?  Is there a workaround/fix to get our jobs
running?
>>>>> 
>>>>> Thanks
>>>>> _______________________________________________
>>>>> 
>>>>> This message may contain information that is confidential or privileged.
If you are not an intended recipient of this message, please delete it and any attachments,
and notify the sender that you have received it in error. Unless specifically stated in the
message or otherwise indicated, you may not duplicate, redistribute or forward this message
or any portion thereof, including any attachments, by any means to any other person, including
any retail investor or customer. This message is not a recommendation, advice, offer or solicitation,
to buy/sell any product or service, and is not an official confirmation of any transaction.
Any opinions presented are solely those of the author and do not necessarily represent those
of Barclays.
>>>>> 
>>>>> This message is subject to terms available at: www.barclays.com/emaildisclaimer
and, if received from Barclays' Sales or Trading desk, the terms available at: www.barclays.com/salesandtradingdisclaimer/.
By messaging with Barclays you consent to the foregoing. Barclays Bank PLC is a company registered
in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
This email may relate to or be sent from other members of the Barclays group.
>>>>> 
>>>>> _______________________________________________
>>>> 
>>> 
>> 
>> 
> 


Mime
View raw message