Hello Jeff,

Thank you for your comments, bu the problem is not about the RangeBatchSize.

In the case of the configuration parameter, mapred.tasktracker.map.tasks.maximum > 1
all the map task times out, they don't even run a single line of code in the Mapper.map() function.

In the case of the configuration parameter, mapred.tasktracker.map.tasks.maximum = 1
Map tasks work one by one on the tasktracker, therefore they finish without any problem at all.

I guess there's some kind of an concurrency problem integration cassandra with hadoop.

I'm using Cassandra 0.6.1 and hadoop 0.20.2

Best Regards,
Utku


On Thu, Apr 29, 2010 at 5:03 PM, Joost Ouwerkerk <joost@openplaces.org> wrote:
The default batch size is 4096, which means that each call to
get_range_slices retrieves 4,096 rows.  I have found that this causes
timeouts when cassandra is under load.  Try reducing the batchsize
with a call to ConfigHelper.setRangeBatchSize().  This has eliminated
the TimedOutExceptions for us.
joost.

On Thu, Apr 29, 2010 at 10:25 AM, Utku Can Topçu <utku@topcu.gen.tr> wrote:
> Hey All,
>
> I'm trying to run some tests on cassandra an Hadoop integration. I'm
> basically following the word count example at
> https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/src/WordCount.java
> using the ColumnFamilyInputFormat.
>
> Currently I have one-node cassandra and hadoop setup on the same machine.
>
> I'm having problems if there are more than one map tasks running on the same
> node, please find the copy of the error message below.
>
> If I limit the map tasks per tasktracker to 1, the MapReduce works fine
> without anyproblems at all.
>
> Do you thinki it's a know issue or am I doing something wrong in
> implementation.
>
> ---------------error----------------
> 10/04/29 13:47:37 INFO mapred.JobClient: Task Id :
> attempt_201004291109_0024_m_000000_1, Status : FAILED
> java.lang.RuntimeException: TimedOutException()
>     at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165)
>     at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215)
>     at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97)
>     at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135)
>     at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130)
>     at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91)
>     at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
>     at
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: TimedOutException()
>     at
> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015)
>     at
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623)
>     at
> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597)
>     at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142)
>     ... 11 more
> ---------------------------------------
>
>
> Best Regards,
> Utku
>