Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 55200 invoked from network); 29 Apr 2010 15:55:29 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 29 Apr 2010 15:55:29 -0000 Received: (qmail 10117 invoked by uid 500); 29 Apr 2010 15:55:28 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 10095 invoked by uid 500); 29 Apr 2010 15:55:28 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 10087 invoked by uid 99); 29 Apr 2010 15:55:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Apr 2010 15:55:28 +0000 X-ASF-Spam-Status: No, hits=2.3 required=10.0 tests=AWL,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.44] (HELO mail-ww0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Apr 2010 15:55:24 +0000 Received: by wwb24 with SMTP id 24so35558wwb.31 for ; Thu, 29 Apr 2010 08:55:02 -0700 (PDT) Received: by 10.216.163.200 with SMTP id a50mr625630wel.158.1272556502301; Thu, 29 Apr 2010 08:55:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.186.78 with HTTP; Thu, 29 Apr 2010 08:54:42 -0700 (PDT) In-Reply-To: References: From: =?UTF-8?Q?Utku_Can_Top=C3=A7u?= Date: Thu, 29 Apr 2010 17:54:42 +0200 Message-ID: Subject: Re: TimedOutException when using the ColumnFamilyInputFormat To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016367fa95346c4380485622671 --0016367fa95346c4380485622671 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello Jeff, Thank you for your comments, bu the problem is not about the RangeBatchSize= . In the case of the configuration parameter, mapred.tasktracker.map.tasks.maximum > 1 all the map task times out, they don't even run a single line of code in th= e Mapper.map() function. In the case of the configuration parameter, mapred.tasktracker.map.tasks.maximum =3D 1 Map tasks work one by one on the tasktracker, therefore they finish without any problem at all. I guess there's some kind of an concurrency problem integration cassandra with hadoop. I'm using Cassandra 0.6.1 and hadoop 0.20.2 Best Regards, Utku On Thu, Apr 29, 2010 at 5:03 PM, Joost Ouwerkerk wrot= e: > The default batch size is 4096, which means that each call to > get_range_slices retrieves 4,096 rows. I have found that this causes > timeouts when cassandra is under load. Try reducing the batchsize > with a call to ConfigHelper.setRangeBatchSize(). This has eliminated > the TimedOutExceptions for us. > joost. > > On Thu, Apr 29, 2010 at 10:25 AM, Utku Can Top=C3=A7u > wrote: > > Hey All, > > > > I'm trying to run some tests on cassandra an Hadoop integration. I'm > > basically following the word count example at > > > https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/src/W= ordCount.java > > using the ColumnFamilyInputFormat. > > > > Currently I have one-node cassandra and hadoop setup on the same machin= e. > > > > I'm having problems if there are more than one map tasks running on the > same > > node, please find the copy of the error message below. > > > > If I limit the map tasks per tasktracker to 1, the MapReduce works fine > > without anyproblems at all. > > > > Do you thinki it's a know issue or am I doing something wrong in > > implementation. > > > > ---------------error---------------- > > 10/04/29 13:47:37 INFO mapred.JobClient: Task Id : > > attempt_201004291109_0024_m_000000_1, Status : FAILED > > java.lang.RuntimeException: TimedOutException() > > at > > > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeIni= t(ColumnFamilyRecordReader.java:165) > > at > > > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeN= ext(ColumnFamilyRecordReader.java:215) > > at > > > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeN= ext(ColumnFamilyRecordReader.java:97) > > at > > > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera= tor.java:135) > > at > > > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:= 130) > > at > > > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnF= amilyRecordReader.java:91) > > at > > > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(Map= Task.java:423) > > at > > org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > Caused by: TimedOutException() > > at > > > org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassan= dra.java:11015) > > at > > > org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassan= dra.java:623) > > at > > > org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.j= ava:597) > > at > > > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeIni= t(ColumnFamilyRecordReader.java:142) > > ... 11 more > > --------------------------------------- > > > > > > Best Regards, > > Utku > > > --0016367fa95346c4380485622671 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello Jeff,

Thank you for your comments, bu the problem is not about= the RangeBatchSize.

In the case of the configuration parameter, map= red.tasktracker.map.tasks.maximum > 1
all the map task times out, the= y don't even run a single line of code in the Mapper.map() function.
In the case of the configuration parameter,=20 mapred.tasktracker.map.tasks.maximum =3D 1
Map tasks work one by one on = the tasktracker, therefore they finish without any problem at all.

I= guess there's some kind of an concurrency problem integration cassandr= a with hadoop.

I'm using Cassandra 0.6.1 and hadoop 0.20.2

Best Regards,Utku


On Thu, Apr 29, 2010 at 5:03 PM= , Joost Ouwerkerk <joost@openplaces.org> wrote:
The default batch= size is 4096, which means that each call to
get_range_slices retrieves 4,096 rows. =C2=A0I have found that this causes<= br> timeouts when cassandra is under load. =C2=A0Try reducing the batchsize
with a call to ConfigHelper.setRangeBatchSize(). =C2=A0This has eliminated<= br> the TimedOutExceptions for us.
joost.

On Thu, Apr 29, 2010 at 10:25 AM, Utku Can Top=C3=A7u <utku@topcu.gen.tr> wrote:
> Hey All,
>
> I'm trying to run some tests on cassandra an Hadoop integration. I= 'm
> basically following the word count example at
> https://svn.apache.org/repos= /asf/cassandra/trunk/contrib/word_count/src/WordCount.java
> using the ColumnFamilyInputFormat.
>
> Currently I have one-node cassandra and hadoop setup on the same machi= ne.
>
> I'm having problems if there are more than one map tasks running o= n the same
> node, please find the copy of the error message below.
>
> If I limit the map tasks per tasktracker to 1, the MapReduce works fin= e
> without anyproblems at all.
>
> Do you thinki it's a know issue or am I doing something wrong in > implementation.
>
> ---------------error----------------
> 10/04/29 13:47:37 INFO mapred.JobClient: Task Id :
> attempt_201004291109_0024_m_000000_1, Status : FAILED
> java.lang.RuntimeException: TimedOutException()
> =C2=A0=C2=A0=C2=A0 at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybe= Init(ColumnFamilyRecordReader.java:165)
> =C2=A0=C2=A0=C2=A0 at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.compu= teNext(ColumnFamilyRecordReader.java:215)
> =C2=A0=C2=A0=C2=A0 at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.compu= teNext(ColumnFamilyRecordReader.java:97)
> =C2=A0=C2=A0=C2=A0 at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIt= erator.java:135)
> =C2=A0=C2=A0=C2=A0 at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.ja= va:130)
> =C2=A0=C2=A0=C2=A0 at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(Colu= mnFamilyRecordReader.java:91)
> =C2=A0=C2=A0=C2=A0 at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(= MapTask.java:423)
> =C2=A0=C2=A0=C2=A0 at
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67= )
> =C2=A0=C2=A0=C2=A0 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.ja= va:143)
> =C2=A0=C2=A0=C2=A0 at org.apache.hadoop.mapred.MapTask.runNewMapper(Ma= pTask.java:621)
> =C2=A0=C2=A0=C2=A0 at org.apache.hadoop.mapred.MapTask.run(MapTask.jav= a:305)
> =C2=A0=C2=A0=C2=A0 at org.apache.hadoop.mapred.Child.main(Child.java:1= 70)
> Caused by: TimedOutException()
> =C2=A0=C2=A0=C2=A0 at
> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cas= sandra.java:11015)
> =C2=A0=C2=A0=C2=A0 at
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cas= sandra.java:623)
> =C2=A0=C2=A0=C2=A0 at
> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandr= a.java:597)
> =C2=A0=C2=A0=C2=A0 at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybe= Init(ColumnFamilyRecordReader.java:142)
> =C2=A0=C2=A0=C2=A0 ... 11 more
> ---------------------------------------
>
>
> Best Regards,
> Utku
>

--0016367fa95346c4380485622671--