From gabriele renzi <>
Subject timeout while running simple hadoop job
Date Fri, 07 May 2010 08:47:39 GMT
Hi everyone,

I am trying to develop a mapreduce job that does a simple
selection+filter on the rows in our store.
Of course it is mostly based on the WordCount example :)

Sadly, while it seems the app runs fine on a test keyspace with little
data, when run on a larger test index (but still on a single node) I
reliably see this error in the logs

10/05/06 16:37:58 WARN mapred.LocalJobRunner: job_local_0001
java.lang.RuntimeException: TimedOutException()
        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(
        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(
        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(
        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(
        at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(
        at org.apache.hadoop.mapred.MapTask.runNewMapper(
        at org.apache.hadoop.mapred.LocalJobRunner$
Caused by: TimedOutException()
        at org.apache.cassandra.thrift.Cassandra$
        at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(
        at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(
        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(
        ... 11 more

and after that the job seems to finish "normally" but no results are produced.

FWIW this is on 0.6.0 (we didn't move to 0.6.1 yet because, well, if
it ain't broke don't fix it).

The single node has a data directory of about 127GB in two column
families, off which the one used in the mapred job is about 100GB.
The cassandra server is run with 6GB of heap on a box with 8GB
available and no swap enabled. read/write latency from cfstat are

        Read Latency: 0.8535837762577986 ms.
        Write Latency: 0.028849603764075547 ms.

row cache is not enabled, key cache percentage is default. Load on the
machine is basically zero when the job is not running.

As my code is 99% that from the wordcount contrib, I shall notice that
In 0.6.1's contrib (and trunk) there is a RING_DELAY constant that we
can supposedly change, but it's apparently not used anywhere, but as I
said, running on a single node this should not be an issue anyway.

Does anyone has suggestions or has seen this error before? On the
other hand, did people run this kind of jobs in similar conditions
flawlessly, so I can consider it just my problem?

Thanks in advance for any help.

