Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of cryptcom@gmail.com designates
 74.125.82.44 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=cAk45R7IMmqCH+WziLgYcq0j5q0UflJFGdPY3TNazpTN/chznaGDU7T0FgcMC2n7ew
         1byfS1JaYlMGxc6GFYcCaBouKDLgzNMT3hSCNVwQtZ9xjO2n7FmplP3RIJMaC1NxjSqO
         1jXG82kI7BLFgMb/dR2keYwVzAC4y40PfViig=
MIME-Version: 1.0
In-Reply-To: <y2v828083e71005070147r181914b4w8e50ee13c5ee3355@mail.gmail.com>
References: <y2v828083e71005070147r181914b4w8e50ee13c5ee3355@mail.gmail.com>
Date: Fri, 7 May 2010 08:49:01 -0400
Message-ID: <h2re7c807841005070549lf2f3ac35pa36106a6f7fdbd34@mail.gmail.com>
Subject: Re: timeout while running simple hadoop job
From: Joseph Stein <cryptcom@gmail.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

The problem could be that you are crunching more data than will be
completed within the interval expire setting.

In Hadoop you need to kind of tell the task tracker that you are still
doing stuff which is done by setting status or incrementing counter on
the Reporter object.

http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-rea=
l-cluster/

"In your Java code there is a little trick to help the job be =93aware=94
within the cluster of tasks that are not dead but just working hard.
During execution of a task there is no built in reporting that the job
is running as expected if it is not writing out.  So this means that
if your tasks are taking up a lot of time doing work it is possible
the cluster will see that task as failed (based on the
mapred.task.tracker.expiry.interval setting).

Have no fear there is a way to tell cluster that your task is doing
just fine.  You have 2 ways todo this you can either report the status
or increment a counter.  Both of these will cause the task tracker to
properly know the task is ok and this will get seen by the jobtracker
in turn.  Both of these options are explained in the JavaDoc
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/R=
eporter.html"

Hope this helps

On Fri, May 7, 2010 at 4:47 AM, gabriele renzi <rff.rff@gmail.com> wrote:
> Hi everyone,
>
> I am trying to develop a mapreduce job that does a simple
> selection+filter on the rows in our store.
> Of course it is mostly based on the WordCount example :)
>
>
> Sadly, while it seems the app runs fine on a test keyspace with little
> data, when run on a larger test index (but still on a single node) I
> reliably see this error in the logs
>
> 10/05/06 16:37:58 WARN mapred.LocalJobRunner: job_local_0001
> java.lang.RuntimeException: TimedOutException()
> =A0 =A0 =A0 =A0at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$Ro=
wIterator.maybeInit(ColumnFamilyRecordReader.java:165)
> =A0 =A0 =A0 =A0at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$Ro=
wIterator.computeNext(ColumnFamilyRecordReader.java:215)
> =A0 =A0 =A0 =A0at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$Ro=
wIterator.computeNext(ColumnFamilyRecordReader.java:97)
> =A0 =A0 =A0 =A0at com.google.common.collect.AbstractIterator.tryToCompute=
Next(AbstractIterator.java:135)
> =A0 =A0 =A0 =A0at com.google.common.collect.AbstractIterator.hasNext(Abst=
ractIterator.java:130)
> =A0 =A0 =A0 =A0at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.ne=
xtKeyValue(ColumnFamilyRecordReader.java:91)
> =A0 =A0 =A0 =A0at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReade=
r.nextKeyValue(MapTask.java:423)
> =A0 =A0 =A0 =A0at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(Map=
Context.java:67)
> =A0 =A0 =A0 =A0at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> =A0 =A0 =A0 =A0at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.j=
ava:583)
> =A0 =A0 =A0 =A0at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> =A0 =A0 =A0 =A0at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJo=
bRunner.java:176)
> Caused by: TimedOutException()
> =A0 =A0 =A0 =A0at org.apache.cassandra.thrift.Cassandra$get_range_slices_=
result.read(Cassandra.java:11015)
> =A0 =A0 =A0 =A0at org.apache.cassandra.thrift.Cassandra$Client.recv_get_r=
ange_slices(Cassandra.java:623)
> =A0 =A0 =A0 =A0at org.apache.cassandra.thrift.Cassandra$Client.get_range_=
slices(Cassandra.java:597)
> =A0 =A0 =A0 =A0at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$Ro=
wIterator.maybeInit(ColumnFamilyRecordReader.java:142)
> =A0 =A0 =A0 =A0... 11 more
>
> and after that the job seems to finish "normally" but no results are prod=
uced.
>
> FWIW this is on 0.6.0 (we didn't move to 0.6.1 yet because, well, if
> it ain't broke don't fix it).
>
> The single node has a data directory of about 127GB in two column
> families, off which the one used in the mapred job is about 100GB.
> The cassandra server is run with 6GB of heap on a box with 8GB
> available and no swap enabled. read/write latency from cfstat are
>
> =A0 =A0 =A0 =A0Read Latency: 0.8535837762577986 ms.
> =A0 =A0 =A0 =A0Write Latency: 0.028849603764075547 ms.
>
> row cache is not enabled, key cache percentage is default. Load on the
> machine is basically zero when the job is not running.
>
> As my code is 99% that from the wordcount contrib, I shall notice that
> In 0.6.1's contrib (and trunk) there is a RING_DELAY constant that we
> can supposedly change, but it's apparently not used anywhere, but as I
> said, running on a single node this should not be an issue anyway.
>
> Does anyone has suggestions or has seen this error before? On the
> other hand, did people run this kind of jobs in similar conditions
> flawlessly, so I can consider it just my problem?
>
>
> Thanks in advance for any help.
>


--=20
/*
Joe Stein
http://www.linkedin.com/in/charmalloc
*/