Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <n2r775e31411004290803je7942e51w6205d0eef4e838ab@mail.gmail.com>
References: <w2g709d5bba1004290725ia712a48g3de117799539dbfb@mail.gmail.com>
	<n2r775e31411004290803je7942e51w6205d0eef4e838ab@mail.gmail.com>
From: =?UTF-8?Q?Utku_Can_Top=C3=A7u?= <utku@topcu.gen.tr>
Date: Thu, 29 Apr 2010 17:54:42 +0200
Message-ID: <m2k709d5bba1004290854s496b6484r1703ceb72a1040@mail.gmail.com>
Subject: Re: TimedOutException when using the ColumnFamilyInputFormat
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=0016367fa95346c4380485622671

--0016367fa95346c4380485622671
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hello Jeff,

Thank you for your comments, bu the problem is not about the RangeBatchSize=
.

In the case of the configuration parameter,
mapred.tasktracker.map.tasks.maximum > 1
all the map task times out, they don't even run a single line of code in th=
e
Mapper.map() function.

In the case of the configuration parameter,
mapred.tasktracker.map.tasks.maximum =3D 1
Map tasks work one by one on the tasktracker, therefore they finish without
any problem at all.

I guess there's some kind of an concurrency problem integration cassandra
with hadoop.

I'm using Cassandra 0.6.1 and hadoop 0.20.2

Best Regards,
Utku


On Thu, Apr 29, 2010 at 5:03 PM, Joost Ouwerkerk <joost@openplaces.org>wrot=
e:

> The default batch size is 4096, which means that each call to
> get_range_slices retrieves 4,096 rows.  I have found that this causes
> timeouts when cassandra is under load.  Try reducing the batchsize
> with a call to ConfigHelper.setRangeBatchSize().  This has eliminated
> the TimedOutExceptions for us.
> joost.
>
> On Thu, Apr 29, 2010 at 10:25 AM, Utku Can Top=C3=A7u <utku@topcu.gen.tr>
> wrote:
> > Hey All,
> >
> > I'm trying to run some tests on cassandra an Hadoop integration. I'm
> > basically following the word count example at
> >
> https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/src/W=
ordCount.java
> > using the ColumnFamilyInputFormat.
> >
> > Currently I have one-node cassandra and hadoop setup on the same machin=
e.
> >
> > I'm having problems if there are more than one map tasks running on the
> same
> > node, please find the copy of the error message below.
> >
> > If I limit the map tasks per tasktracker to 1, the MapReduce works fine
> > without anyproblems at all.
> >
> > Do you thinki it's a know issue or am I doing something wrong in
> > implementation.
> >
> > ---------------error----------------
> > 10/04/29 13:47:37 INFO mapred.JobClient: Task Id :
> > attempt_201004291109_0024_m_000000_1, Status : FAILED
> > java.lang.RuntimeException: TimedOutException()
> >     at
> >
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeIni=
t(ColumnFamilyRecordReader.java:165)
> >     at
> >
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeN=
ext(ColumnFamilyRecordReader.java:215)
> >     at
> >
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeN=
ext(ColumnFamilyRecordReader.java:97)
> >     at
> >
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera=
tor.java:135)
> >     at
> >
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:=
130)
> >     at
> >
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnF=
amilyRecordReader.java:91)
> >     at
> >
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(Map=
Task.java:423)
> >     at
> > org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> >     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> >     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> >     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >     at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > Caused by: TimedOutException()
> >     at
> >
> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassan=
dra.java:11015)
> >     at
> >
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassan=
dra.java:623)
> >     at
> >
> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.j=
ava:597)
> >     at
> >
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeIni=
t(ColumnFamilyRecordReader.java:142)
> >     ... 11 more
> > ---------------------------------------
> >
> >
> > Best Regards,
> > Utku
> >
>

--0016367fa95346c4380485622671
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hello Jeff,<br><br>Thank you for your comments, bu the problem is not about=
 the RangeBatchSize.<br><br>In the case of the configuration parameter, map=
red.tasktracker.map.tasks.maximum &gt; 1<br>all the map task times out, the=
y don&#39;t even run a single line of code in the Mapper.map() function.<br=
>

<br>In the case of the configuration parameter,=20
mapred.tasktracker.map.tasks.maximum =3D 1<br>Map tasks work one by one on =
the tasktracker, therefore they finish without any problem at all.<br><br>I=
 guess there&#39;s some kind of an concurrency problem integration cassandr=
a with hadoop.<br>

<br>I&#39;m using Cassandra 0.6.1 and hadoop 0.20.2<br><br>Best Regards,<br=
>Utku<br><br><br><div class=3D"gmail_quote">On Thu, Apr 29, 2010 at 5:03 PM=
, Joost Ouwerkerk <span dir=3D"ltr">&lt;<a href=3D"mailto:joost@openplaces.=
org">joost@openplaces.org</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">The default batch=
 size is 4096, which means that each call to<br>
get_range_slices retrieves 4,096 rows. =C2=A0I have found that this causes<=
br>
timeouts when cassandra is under load. =C2=A0Try reducing the batchsize<br>
with a call to ConfigHelper.setRangeBatchSize(). =C2=A0This has eliminated<=
br>
the TimedOutExceptions for us.<br>
<font color=3D"#888888">joost.<br>
</font><div><div></div><div class=3D"h5"><br>
On Thu, Apr 29, 2010 at 10:25 AM, Utku Can Top=C3=A7u &lt;<a href=3D"mailto=
:utku@topcu.gen.tr">utku@topcu.gen.tr</a>&gt; wrote:<br>
&gt; Hey All,<br>
&gt;<br>
&gt; I&#39;m trying to run some tests on cassandra an Hadoop integration. I=
&#39;m<br>
&gt; basically following the word count example at<br>
&gt; <a href=3D"https://svn.apache.org/repos/asf/cassandra/trunk/contrib/wo=
rd_count/src/WordCount.java" target=3D"_blank">https://svn.apache.org/repos=
/asf/cassandra/trunk/contrib/word_count/src/WordCount.java</a><br>
&gt; using the ColumnFamilyInputFormat.<br>
&gt;<br>
&gt; Currently I have one-node cassandra and hadoop setup on the same machi=
ne.<br>
&gt;<br>
&gt; I&#39;m having problems if there are more than one map tasks running o=
n the same<br>
&gt; node, please find the copy of the error message below.<br>
&gt;<br>
&gt; If I limit the map tasks per tasktracker to 1, the MapReduce works fin=
e<br>
&gt; without anyproblems at all.<br>
&gt;<br>
&gt; Do you thinki it&#39;s a know issue or am I doing something wrong in<b=
r>
&gt; implementation.<br>
&gt;<br>
&gt; ---------------error----------------<br>
&gt; 10/04/29 13:47:37 INFO mapred.JobClient: Task Id :<br>
&gt; attempt_201004291109_0024_m_000000_1, Status : FAILED<br>
&gt; java.lang.RuntimeException: TimedOutException()<br>
&gt; =C2=A0=C2=A0=C2=A0 at<br>
&gt; org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybe=
Init(ColumnFamilyRecordReader.java:165)<br>
&gt; =C2=A0=C2=A0=C2=A0 at<br>
&gt; org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.compu=
teNext(ColumnFamilyRecordReader.java:215)<br>
&gt; =C2=A0=C2=A0=C2=A0 at<br>
&gt; org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.compu=
teNext(ColumnFamilyRecordReader.java:97)<br>
&gt; =C2=A0=C2=A0=C2=A0 at<br>
&gt; com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIt=
erator.java:135)<br>
&gt; =C2=A0=C2=A0=C2=A0 at<br>
&gt; com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.ja=
va:130)<br>
&gt; =C2=A0=C2=A0=C2=A0 at<br>
&gt; org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(Colu=
mnFamilyRecordReader.java:91)<br>
&gt; =C2=A0=C2=A0=C2=A0 at<br>
&gt; org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(=
MapTask.java:423)<br>
&gt; =C2=A0=C2=A0=C2=A0 at<br>
&gt; org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67=
)<br>
&gt; =C2=A0=C2=A0=C2=A0 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.ja=
va:143)<br>
&gt; =C2=A0=C2=A0=C2=A0 at org.apache.hadoop.mapred.MapTask.runNewMapper(Ma=
pTask.java:621)<br>
&gt; =C2=A0=C2=A0=C2=A0 at org.apache.hadoop.mapred.MapTask.run(MapTask.jav=
a:305)<br>
&gt; =C2=A0=C2=A0=C2=A0 at org.apache.hadoop.mapred.Child.main(Child.java:1=
70)<br>
&gt; Caused by: TimedOutException()<br>
&gt; =C2=A0=C2=A0=C2=A0 at<br>
&gt; org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cas=
sandra.java:11015)<br>
&gt; =C2=A0=C2=A0=C2=A0 at<br>
&gt; org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cas=
sandra.java:623)<br>
&gt; =C2=A0=C2=A0=C2=A0 at<br>
&gt; org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandr=
a.java:597)<br>
&gt; =C2=A0=C2=A0=C2=A0 at<br>
&gt; org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybe=
Init(ColumnFamilyRecordReader.java:142)<br>
&gt; =C2=A0=C2=A0=C2=A0 ... 11 more<br>
&gt; ---------------------------------------<br>
&gt;<br>
&gt;<br>
&gt; Best Regards,<br>
&gt; Utku<br>
&gt;<br>
</div></div></blockquote></div><br>

--0016367fa95346c4380485622671--