cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: timeout while running simple hadoop job
Date Fri, 07 May 2010 13:29:00 GMT
The whole point is to parallelize to use the available capacity across
multiple machines.  If you go past that point (fairly easy when you
have a single machine) then you're just contending for resources, not
making things faster.

On Fri, May 7, 2010 at 7:48 AM, Joost Ouwerkerk <joost@openplaces.org> wrote:
> Huh? Isn't that the whole point of using Map/Reduce?
>
> On Fri, May 7, 2010 at 8:44 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
>> Sounds like you need to configure Hadoop to not create a whole bunch
>> of Map tasks at once
>>
>> On Fri, May 7, 2010 at 3:47 AM, gabriele renzi <rff.rff@gmail.com> wrote:
>>> Hi everyone,
>>>
>>> I am trying to develop a mapreduce job that does a simple
>>> selection+filter on the rows in our store.
>>> Of course it is mostly based on the WordCount example :)
>>>
>>>
>>> Sadly, while it seems the app runs fine on a test keyspace with little
>>> data, when run on a larger test index (but still on a single node) I
>>> reliably see this error in the logs
>>>
>>> 10/05/06 16:37:58 WARN mapred.LocalJobRunner: job_local_0001
>>> java.lang.RuntimeException: TimedOutException()
>>>        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165)
>>>        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215)
>>>        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97)
>>>        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135)
>>>        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130)
>>>        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91)
>>>        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
>>>        at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
>>> Caused by: TimedOutException()
>>>        at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015)
>>>        at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623)
>>>        at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597)
>>>        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142)
>>>        ... 11 more
>>>
>>> and after that the job seems to finish "normally" but no results are produced.
>>>
>>> FWIW this is on 0.6.0 (we didn't move to 0.6.1 yet because, well, if
>>> it ain't broke don't fix it).
>>>
>>> The single node has a data directory of about 127GB in two column
>>> families, off which the one used in the mapred job is about 100GB.
>>> The cassandra server is run with 6GB of heap on a box with 8GB
>>> available and no swap enabled. read/write latency from cfstat are
>>>
>>>        Read Latency: 0.8535837762577986 ms.
>>>        Write Latency: 0.028849603764075547 ms.
>>>
>>> row cache is not enabled, key cache percentage is default. Load on the
>>> machine is basically zero when the job is not running.
>>>
>>> As my code is 99% that from the wordcount contrib, I shall notice that
>>> In 0.6.1's contrib (and trunk) there is a RING_DELAY constant that we
>>> can supposedly change, but it's apparently not used anywhere, but as I
>>> said, running on a single node this should not be an issue anyway.
>>>
>>> Does anyone has suggestions or has seen this error before? On the
>>> other hand, did people run this kind of jobs in similar conditions
>>> flawlessly, so I can consider it just my problem?
>>>
>>>
>>> Thanks in advance for any help.
>>>
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message