incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleksandr Petrov <oleksandr.pet...@gmail.com>
Subject Re: Thrift message length exceeded
Date Mon, 22 Apr 2013 11:57:14 GMT
I've submitted a patch that fixes the issue for 1.2.3:
https://issues.apache.org/jira/browse/CASSANDRA-5504

Maybe guys know a better way to fix it, but that helped me in a meanwhile.


On Mon, Apr 22, 2013 at 1:44 AM, Oleksandr Petrov <
oleksandr.petrov@gmail.com> wrote:

> If you're using Cassandra 1.2.3, and new Hadoop interface, that would make
> a call to next(), you'll have an eternal loop reading same things all over
> again from your cassandra nodes (you may see it if you enable Debug output).
>
> next() is clearing key() which is required for Wide Row iteration.
>
> Setting key back fixed issue for me.
>
>
> On Sat, Apr 20, 2013 at 3:05 PM, Oleksandr Petrov <
> oleksandr.petrov@gmail.com> wrote:
>
>> Tried to isolate the issue in testing environment,
>>
>> What I currently have:
>>
>> That's a setup for test:
>> CREATE KEYSPACE cascading_cassandra WITH replication = {'class' :
>> 'SimpleStrategy', 'replication_factor' : 1};
>> USE cascading_cassandra;
>> CREATE TABLE libraries (emitted_at timestamp, additional_info varchar,
>> environment varchar, application varchar, type varchar, PRIMARY KEY
>> (application, environment, type, emitted_at)) WITH COMPACT STORAGE;
>>
>> Next, insert some test data:
>>
>> (just for example)
>> [INSERT INTO libraries (application, environment, type, additional_info,
>> emitted_at) VALUES (?, ?, ?, ?, ?); [app env type 0 #inst "2013-04-20T13:01:
>> 04.935-00:00"]]
>>
>> If keys (e.q. "app" "env" "type") are all same across the dataset, it
>> works correctly.
>> As soon as I start varying keys, e.q. "app1", "app2", "app3" or others, I
>> get the error with Message Length Exceeded.
>>
>> Does anyone have some ideas?
>> Thanks for help!
>>
>>
>> On Sat, Apr 20, 2013 at 1:56 PM, Oleksandr Petrov <
>> oleksandr.petrov@gmail.com> wrote:
>>
>>> I can confirm running same problem.
>>>
>>> Tried ConfigHelper.setThriftMaxMessageLengthInMb();, and tuning server
>>> side, reducing/increasing batch size.
>>>
>>> Here's stacktrace from Hadoop/Cassandra, maybe it could give a hint:
>>>
>>> Caused by: org.apache.thrift.protocol.TProtocolException: Message length
>>> exceeded: 8
>>> at
>>> org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)
>>>
>>> at
>>> org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
>>> at org.apache.cassandra.thrift.Column.read(Column.java:528)
>>>  at
>>> org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507)
>>> at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408)
>>>  at
>>> org.apache.cassandra.thrift.Cassandra$get_paged_slice_result.read(Cassandra.java:14157)
>>> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
>>>  at
>>> org.apache.cassandra.thrift.Cassandra$Client.recv_get_paged_slice(Cassandra.java:769)
>>> at
>>> org.apache.cassandra.thrift.Cassandra$Client.get_paged_slice(Cassandra.java:753)
>>>  at
>>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.maybeInit(ColumnFamilyRecordReader.java:438)
>>>
>>>
>>> On Thu, Apr 18, 2013 at 12:34 AM, Lanny Ripple <lanny@spotright.com>wrote:
>>>
>>>> It's slow going finding the time to do so but I'm working on that.
>>>>
>>>> We do have another table that has one or sometimes two columns per row.
>>>>  We can run jobs on it without issue.  I looked through
>>>> org.apache.cassandra.hadoop code and don't see anything that's really
>>>> changed since 1.1.5 (which was also using thrift-0.7) so something of a
>>>> puzzler about what's going on.
>>>>
>>>>
>>>> On Apr 17, 2013, at 2:47 PM, aaron morton <aaron@thelastpickle.com>
>>>> wrote:
>>>>
>>>> > Can you reproduce this in a simple way ?
>>>> >
>>>> > Cheers
>>>> >
>>>> > -----------------
>>>> > Aaron Morton
>>>> > Freelance Cassandra Consultant
>>>> > New Zealand
>>>> >
>>>> > @aaronmorton
>>>> > http://www.thelastpickle.com
>>>> >
>>>> > On 18/04/2013, at 5:50 AM, Lanny Ripple <lanny@spotright.com>
wrote:
>>>> >
>>>> >> That was our first thought.  Using maven's dependency tree info
we
>>>> verified that we're using the expected (cass 1.2.3) jars
>>>> >>
>>>> >> $ mvn dependency:tree | grep thrift
>>>> >> [INFO] |  +- org.apache.thrift:libthrift:jar:0.7.0:compile
>>>> >> [INFO] |  \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile
>>>> >>
>>>> >> I've also dumped the final command run by the hadoop we use (CDH3u5)
>>>> and verified it's not sneaking thrift in on us.
>>>> >>
>>>> >>
>>>> >> On Tue, Apr 16, 2013 at 4:36 PM, aaron morton <
>>>> aaron@thelastpickle.com> wrote:
>>>> >> Can you confirm the you are using the same thrift version that ships
>>>> 1.2.3 ?
>>>> >>
>>>> >> Cheers
>>>> >>
>>>> >> -----------------
>>>> >> Aaron Morton
>>>> >> Freelance Cassandra Consultant
>>>> >> New Zealand
>>>> >>
>>>> >> @aaronmorton
>>>> >> http://www.thelastpickle.com
>>>> >>
>>>> >> On 16/04/2013, at 10:17 AM, Lanny Ripple <lanny@spotright.com>
>>>> wrote:
>>>> >>
>>>> >>> A bump to say I found this
>>>> >>>
>>>> >>>
>>>> http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded
>>>> >>>
>>>> >>> so others are seeing similar behavior.
>>>> >>>
>>>> >>> From what I can see of org.apache.cassandra.hadoop nothing has
>>>> changed since 1.1.5 when we didn't see such things but sure looks like
>>>> there's a bug that's slipped in (or been uncovered) somewhere.  I'll try
to
>>>> narrow down to a dataset and code that can reproduce.
>>>> >>>
>>>> >>> On Apr 10, 2013, at 6:29 PM, Lanny Ripple <lanny@spotright.com>
>>>> wrote:
>>>> >>>
>>>> >>>> We are using Astyanax in production but I cut back to just
Hadoop
>>>> and Cassandra to confirm it's a Cassandra (or our use of Cassandra) problem.
>>>> >>>>
>>>> >>>> We do have some extremely large rows but we went from everything
>>>> working with 1.1.5 to almost everything carping with 1.2.3.  Something has
>>>> changed.  Perhaps we were doing something wrong earlier that 1.2.3 exposed
>>>> but surprises are never welcome in production.
>>>> >>>>
>>>> >>>> On Apr 10, 2013, at 8:10 AM, <moshe.kranc@barclays.com>
wrote:
>>>> >>>>
>>>> >>>>> I also saw this when upgrading from C* 1.0 to 1.2.2,
and from
>>>> hector 0.6 to 0.8
>>>> >>>>> Turns out the Thrift message really was too long.
>>>> >>>>> The mystery to me: Why no complaints in previous versions?
Were
>>>> some checks added in Thrift or Hector?
>>>> >>>>>
>>>> >>>>> -----Original Message-----
>>>> >>>>> From: Lanny Ripple [mailto:lanny@spotright.com]
>>>> >>>>> Sent: Tuesday, April 09, 2013 6:17 PM
>>>> >>>>> To: user@cassandra.apache.org
>>>> >>>>> Subject: Thrift message length exceeded
>>>> >>>>>
>>>> >>>>> Hello,
>>>> >>>>>
>>>> >>>>> We have recently upgraded to Cass 1.2.3 from Cass 1.1.5.
 We ran
>>>> sstableupgrades and got the ring on its feet and we are now seeing a new
>>>> issue.
>>>> >>>>>
>>>> >>>>> When we run MapReduce jobs against practically any table
we find
>>>> the following errors:
>>>> >>>>>
>>>> >>>>> 2013-04-09 09:58:47,746 INFO
>>>> org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
>>>> >>>>> 2013-04-09 09:58:47,899 INFO
>>>> org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with
>>>> processName=MAP, sessionId=
>>>> >>>>> 2013-04-09 09:58:48,021 INFO org.apache.hadoop.util.ProcessTree:
>>>> setsid exited with exit code 0
>>>> >>>>> 2013-04-09 09:58:48,024 INFO org.apache.hadoop.mapred.Task:
>>>>  Using ResourceCalculatorPlugin :
>>>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a48edb5
>>>> >>>>> 2013-04-09 09:58:50,475 INFO
>>>> org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater
>>>> with mapRetainSize=-1 and reduceRetainSize=-1
>>>> >>>>> 2013-04-09 09:58:50,477 WARN org.apache.hadoop.mapred.Child:
>>>> Error running child
>>>> >>>>> java.lang.RuntimeException: org.apache.thrift.TException:
Message
>>>> length exceeded: 106
>>>> >>>>>   at
>>>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
>>>> >>>>>   at
>>>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
>>>> >>>>>   at
>>>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
>>>> >>>>>   at
>>>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>>>> >>>>>   at
>>>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>>>> >>>>>   at
>>>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
>>>> >>>>>   at
>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:444)
>>>> >>>>>   at
>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:460)
>>>> >>>>>   at
>>>> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>>>> >>>>>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>>>> >>>>>   at
>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>>>> >>>>>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>>>> >>>>>   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>>>> >>>>>   at java.security.AccessController.doPrivileged(Native
Method)
>>>> >>>>>   at javax.security.auth.Subject.doAs(Subject.java:396)
>>>> >>>>>   at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
>>>> >>>>>   at org.apache.hadoop.mapred.Child.main(Child.java:260)
>>>> >>>>> Caused by: org.apache.thrift.TException: Message length
exceeded:
>>>> 106
>>>> >>>>>   at
>>>> org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)
>>>> >>>>>   at
>>>> org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
>>>> >>>>>   at org.apache.cassandra.thrift.Column.read(Column.java:528)
>>>> >>>>>   at
>>>> org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507)
>>>> >>>>>   at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408)
>>>> >>>>>   at
>>>> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12905)
>>>> >>>>>   at
>>>> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
>>>> >>>>>   at
>>>> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734)
>>>> >>>>>   at
>>>> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718)
>>>> >>>>>   at
>>>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346)
>>>> >>>>>   ... 16 more
>>>> >>>>> 2013-04-09 09:58:50,481 INFO org.apache.hadoop.mapred.Task:
>>>> Runnning cleanup for the task
>>>> >>>>>
>>>> >>>>> The message length listed on each failed job differs
(not always
>>>> 106).  Jobs that used to run fine now fail with code compiled against cass
>>>> 1.2.3 (and work fine if compiled against 1.1.5 and run against the 1.2.3
>>>> servers in production).  I'm using the following setup to configure the job:
>>>> >>>>>
>>>> >>>>> def cassConfig(job: Job) {
>>>> >>>>>  val conf = job.getConfiguration()
>>>> >>>>>
>>>> >>>>>  ConfigHelper.setInputRpcPort(conf, "" + 9160)
>>>> >>>>>  ConfigHelper.setInputInitialAddress(conf, Config.hostip)
>>>> >>>>>
>>>> >>>>>  ConfigHelper.setInputPartitioner(conf,
>>>> "org.apache.cassandra.dht.RandomPartitioner")
>>>> >>>>>  ConfigHelper.setInputColumnFamily(conf, Config.keyspace,
>>>> Config.cfname)
>>>> >>>>>
>>>> >>>>>  val pred = {
>>>> >>>>>    val range = new SliceRange()
>>>> >>>>>      .setStart("".getBytes("UTF-8"))
>>>> >>>>>      .setFinish("".getBytes("UTF-8"))
>>>> >>>>>      .setReversed(false)
>>>> >>>>>      .setCount(4096 * 1000)
>>>> >>>>>
>>>> >>>>>    new SlicePredicate().setSlice_range(range)
>>>> >>>>>  }
>>>> >>>>>
>>>> >>>>>  ConfigHelper.setInputSlicePredicate(conf, pred)
>>>> >>>>> }
>>>> >>>>>
>>>> >>>>> The job consists only of a mapper that increments counters
for
>>>> each row and associated columns so all I'm really doing is exercising
>>>> ColumnFamilyRecordReader.
>>>> >>>>>
>>>> >>>>> Has anyone else seen this?  Is there a workaround/fix
to get our
>>>> jobs running?
>>>> >>>>>
>>>> >>>>> Thanks
>>>> >>>>> _______________________________________________
>>>> >>>>>
>>>> >>>>> This message may contain information that is confidential
or
>>>> privileged. If you are not an intended recipient of this message, please
>>>> delete it and any attachments, and notify the sender that you have received
>>>> it in error. Unless specifically stated in the message or otherwise
>>>> indicated, you may not duplicate, redistribute or forward this message or
>>>> any portion thereof, including any attachments, by any means to any other
>>>> person, including any retail investor or customer. This message is not a
>>>> recommendation, advice, offer or solicitation, to buy/sell any product or
>>>> service, and is not an official confirmation of any transaction. Any
>>>> opinions presented are solely those of the author and do not necessarily
>>>> represent those of Barclays.
>>>> >>>>>
>>>> >>>>> This message is subject to terms available at:
>>>> www.barclays.com/emaildisclaimer and, if received from Barclays' Sales
>>>> or Trading desk, the terms available at:
>>>> www.barclays.com/salesandtradingdisclaimer/. By messaging with
>>>> Barclays you consent to the foregoing. Barclays Bank PLC is a company
>>>> registered in England (number 1026167) with its registered office at 1
>>>> Churchill Place, London, E14 5HP. This email may relate to or be sent from
>>>> other members of the Barclays group.
>>>> >>>>>
>>>> >>>>> _______________________________________________
>>>> >>>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >
>>>>
>>>>
>>>
>>>
>>> --
>>> alex p
>>>
>>
>>
>>
>> --
>> alex p
>>
>
>
>
> --
> alex p
>



-- 
alex p

Mime
View raw message