Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A44A6CC37 for ; Sun, 9 Jun 2013 02:15:35 +0000 (UTC) Received: (qmail 34215 invoked by uid 500); 9 Jun 2013 02:15:33 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 34190 invoked by uid 500); 9 Jun 2013 02:15:33 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 34182 invoked by uid 99); 9 Jun 2013 02:15:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Jun 2013 02:15:33 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of arthur.zubarev@aol.com designates 205.188.252.208 as permitted sender) Received: from [205.188.252.208] (HELO omr-d01.mx.aol.com) (205.188.252.208) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Jun 2013 02:15:27 +0000 Received: from mtaout-mb06.r1000.mx.aol.com (mtaout-mb06.r1000.mx.aol.com [172.29.41.70]) by omr-d01.mx.aol.com (Outbound Mail Relay) with ESMTP id 392F77000009F for ; Sat, 8 Jun 2013 22:15:06 -0400 (EDT) Received: from [192.168.0.90] (CPE185933f3db5c-CM185933f3db59.cpe.net.cable.rogers.com [99.238.22.30]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mtaout-mb06.r1000.mx.aol.com (MUA/Third Party Client Interface) with ESMTPSA id C79F1E0000B7 for ; Sat, 8 Jun 2013 22:15:05 -0400 (EDT) Message-ID: <51B3E528.5050006@aol.com> Date: Sat, 08 Jun 2013 22:15:04 -0400 From: Arthur Zubarev Reply-To: arthur.zubarev@aol.com User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Cassandra (1.2.5) + Pig (0.11.1) Errors with large column families References: In-Reply-To: Content-Type: multipart/alternative; boundary="------------090701070301090405090701" x-aol-global-disposition: G DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mx.aol.com; s=20121107; t=1370744106; bh=GXNa9SLsbv8hRpyPxYsypTP/fO9OoQT7+mRsTKMo4aQ=; h=From:To:Subject:Message-ID:Date:MIME-Version:Content-Type; b=bEvQHmmHCtErPxYdrd1wKvK2rJlNYrVNUCN5hPwfY0yvtmIk79xnai1byDzSnhu8E vkffn7Q0PjgWZb5OkQTwLDebaqUPA5CwVKV7gb3ad14j3NfdTDit69q391877Xxuyj 1eZHJ0vyzQJ/9nz2vzahD2DnzGn1Hz3DI9qiN5rE= X-AOL-SCOLL-SCORE: 0:2:463883904:93952408 X-AOL-SCOLL-URL_COUNT: 0 x-aol-sid: 3039ac1d294651b3e5290d29 X-AOL-IP: 99.238.22.30 X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------090701070301090405090701 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 06/07/2013 06:02 PM, Mark Lewandowski wrote: > I'm currently trying to get Cassandra (1.2.5) and Pig (0.11.1) to play > nice together. I'm running a basic script: > > rows = LOAD 'cassandra://keyspace/colfam' USING CassandraStorage(); > dump rows; > > This fails for my column family which has ~100,000 rows. However, if > I modify the script to this: > > rows = LOAD 'cassandra://betable_games/bets' USING CassandraStorage(); > rows = limit rows 7000; > dump rows; > > Then it seems to work. 7000 is about as high as I've been able to get > it before it fails. The error I keep getting is: > > 2013-06-07 14:58:49,119 [Thread-4] WARN > org.apache.hadoop.mapred.LocalJobRunner - job_local_0001 > java.lang.RuntimeException: org.apache.thrift.TException: Message > length exceeded: 4480 > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.getProgress(PigRecordReader.java:169) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:514) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:539) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214) > Caused by: org.apache.thrift.TException: Message length exceeded: 4480 > at > org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393) > at > org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363) > at org.apache.cassandra.thrift.Column.read(Column.java:535) > at > org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507) > at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408) > at > org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12905) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) > at > org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734) > at > org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346) > ... 13 more > > > I've seen a similar problem on this mailing list using > Cassandra-1.2.3, however the fixes on that thread of increasing > thrift_framed_transport_size_in_mb, thrift_max_message_length_in_mb in > cassandra.yaml did not appear to have any effect. Has anyone else > seen this issue, and how can I fix it? > > Thanks, > > -Mark Mark, Reading your email made me wonder if your CF needs the compact storage directive applied as in the post about the Bulk Loader, in short, defining your CF WITH COMPACT STORAGE and compaction = {'class' : 'LeveledCompactionStrategy' } Hopefully enables you to read the data in full. -- Regards, Arthur --------------090701070301090405090701 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit
On 06/07/2013 06:02 PM, Mark Lewandowski wrote:
I'm currently trying to get Cassandra (1.2.5) and Pig (0.11.1) to play nice together.  I'm running a basic script:

rows = LOAD 'cassandra://keyspace/colfam' USING CassandraStorage();
dump rows;

This fails for my column family which has ~100,000 rows.  However, if I modify the script to this:

rows = LOAD 'cassandra://betable_games/bets' USING CassandraStorage();
rows = limit rows 7000;
dump rows;

Then it seems to work.  7000 is about as high as I've been able to get it before it fails.  The error I keep getting is:

2013-06-07 14:58:49,119 [Thread-4] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.RuntimeException: org.apache.thrift.TException: Message length exceeded: 4480
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.getProgress(PigRecordReader.java:169)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:514)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:539)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
Caused by: org.apache.thrift.TException: Message length exceeded: 4480
at org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)
at org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
at org.apache.cassandra.thrift.Column.read(Column.java:535)
at org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507)
at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408)
at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12905)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734)
at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346)
... 13 more


I've seen a similar problem on this mailing list using Cassandra-1.2.3, however the fixes on that thread of increasing thrift_framed_transport_size_in_mb, thrift_max_message_length_in_mb in cassandra.yaml did not appear to have any effect.  Has anyone else seen this issue, and how can I fix it?

Thanks,

-Mark
Mark,

Reading your email made me wonder if your CF needs the compact storage directive applied as in the post about the Bulk Loader, in short, defining your CF
WITH COMPACT STORAGE 
and compaction = {'class' : 'LeveledCompactionStrategy' }

Hopefully enables you to read the data in full.
-- 

Regards,

Arthur
--------------090701070301090405090701--