Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E973B10861 for ; Fri, 7 Jun 2013 22:03:18 +0000 (UTC) Received: (qmail 68427 invoked by uid 500); 7 Jun 2013 22:03:16 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 68373 invoked by uid 500); 7 Jun 2013 22:03:16 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 68365 invoked by uid 99); 7 Jun 2013 22:03:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Jun 2013 22:03:16 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mark.e.lewandowski@gmail.com designates 209.85.128.181 as permitted sender) Received: from [209.85.128.181] (HELO mail-ve0-f181.google.com) (209.85.128.181) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Jun 2013 22:03:09 +0000 Received: by mail-ve0-f181.google.com with SMTP id db10so3397129veb.40 for ; Fri, 07 Jun 2013 15:02:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=KEFSpLr2BP4pcOR7ZEOzguGe0pHlojUjFUh9+YLtTVA=; b=QlRYuLsmSq86ktvGcWfnA3CTph78qUdvxQ1Af0jogDuVUqiBj2hXoyj+ljuHy2ie2B UTCq9VRt2PEC/Ua0FJ3DnrRm2ljUzSG3hj17xKqDuLggYlqA8xfr+p6olOQlXHrfmnuk S4xg65B2f7Me++MjHKVB+1I0HZpYvWLKfZ/9oOUp05JxHMf6l/rpu9fLBJDGD+l5kNK9 0Z9NjbfZHA/F15V0YepPJjNmMkGvCxbHncQh8alDcwFqpsOtgQ2l4esPyc1fPKU6aOnc MAw6SDw8PVCE2UbgzzcyFgKpwesFwOOK0nvcPTXuAAtP2XW/gk4TgBHDHOSYenUwNjY4 ahSA== MIME-Version: 1.0 X-Received: by 10.52.243.234 with SMTP id xb10mr195705vdc.68.1370642568449; Fri, 07 Jun 2013 15:02:48 -0700 (PDT) Received: by 10.52.53.72 with HTTP; Fri, 7 Jun 2013 15:02:48 -0700 (PDT) Date: Fri, 7 Jun 2013 15:02:48 -0700 Message-ID: Subject: Cassandra (1.2.5) + Pig (0.11.1) Errors with large column families From: Mark Lewandowski To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a11c1c7dc686c9f04de9797b9 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c1c7dc686c9f04de9797b9 Content-Type: text/plain; charset=ISO-8859-1 I'm currently trying to get Cassandra (1.2.5) and Pig (0.11.1) to play nice together. I'm running a basic script: rows = LOAD 'cassandra://keyspace/colfam' USING CassandraStorage(); dump rows; This fails for my column family which has ~100,000 rows. However, if I modify the script to this: rows = LOAD 'cassandra://betable_games/bets' USING CassandraStorage(); rows = limit rows 7000; dump rows; Then it seems to work. 7000 is about as high as I've been able to get it before it fails. The error I keep getting is: 2013-06-07 14:58:49,119 [Thread-4] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001 java.lang.RuntimeException: org.apache.thrift.TException: Message length exceeded: 4480 at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.getProgress(PigRecordReader.java:169) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:514) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:539) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214) Caused by: org.apache.thrift.TException: Message length exceeded: 4480 at org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393) at org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363) at org.apache.cassandra.thrift.Column.read(Column.java:535) at org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507) at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408) at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12905) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346) ... 13 more I've seen a similar problem on this mailing list using Cassandra-1.2.3, however the fixes on that thread of increasing thrift_framed_transport_size_in_mb, thrift_max_message_length_in_mb in cassandra.yaml did not appear to have any effect. Has anyone else seen this issue, and how can I fix it? Thanks, -Mark --001a11c1c7dc686c9f04de9797b9 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I'm currently trying= to get Cassandra (1.2.5) and Pig (0.11.1) to play nice together. =A0I'= m running a basic script:

=
rows =3D LOAD 'cassandra://= keyspace/colfam' USING CassandraStorage();
dump rows;

This fails for my= column family which has ~100,000 rows. =A0However, if I modify the script = to this:

rows =3D LOAD 'cassan= dra://betable_games/bets' USING CassandraStorage();
rows =3D limit rows 7000;
dump rows;

Then it seems to work. =A07000 is about as high as I've been ab= le to get it before it fails. =A0The error I keep getting is:

2013-06-07 14:58:49,119 [Thread-4] WAR= N =A0org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
<= div> java.lang.RuntimeException: org.apache.th= rift.TException: Message length exceeded: 4480
= at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.m= aybeInit(ColumnFamilyRecordReader.java:384)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$Stat= icRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRow= Iterator.computeNext(ColumnFamilyRecordReader.java:313)
= at com.google.common.collect.AbstractIterator.tryToComputeNext(Abst= ractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(Abstra= ctIterator.java:138)
at org.apache.cassandra.had= oop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)=
at org.apache.pig.backend.hadoop.executionengine.mapReduceLa= yer.PigRecordReader.getProgress(PigRecordReader.java:169)
= at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getPr= ogress(MapTask.java:514)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.= nextKeyValue(MapTask.java:539)
at org.apache.had= oop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(= MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run= (LocalJobRunner.java:214)
Caused by: org.apache.thrift.TExcept= ion: Message length exceeded: 4480
at org.apache= .thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)<= /font>
at org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBi= naryProtocol.java:363)
<= span class=3D"" style=3D"white-space:pre"> at org.apache.cassandra.t= hrift.Column.read(Column.java:535)
at org.apache.cassandra.thrift.ColumnOrSuperColumn.read(Colu= mnOrSuperColumn.java:507)
at org.apache.cassandr= a.thrift.KeySlice.read(KeySlice.java:408)
at org.apache.cassandra.thrift.Cassandra$get_range_slices_re= sult.read(Cassandra.java:12905)
at org.apache.th= rift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_get_ran= ge_slices(Cassandra.java:734)
at org.apache.cass= andra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$Stat= icRowIterator.maybeInit(ColumnFamilyRecordReader.java:346)
... 13 more


I've seen a similar problem on this mailing list using Cassandra= -1.2.3, however the fixes on that thread of increasing=A0thrift_framed_transport_size_in_mb,=A0thrift_max_message_length_in_mb in cassandra.yaml did not= appear to have any effect. =A0Has anyone else seen this issue, and how can= I fix it?

Thanks,

-Mark
--001a11c1c7dc686c9f04de9797b9--