incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Cassandra + Hadoop - 2 Task attempts with million of rows
Date Fri, 26 Apr 2013 02:03:56 GMT
> 2013-04-23 16:09:17,838 INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader:
Current split being processed ColumnFamilySplit((9197470410121435301, '-1] @[p00nosql02.00,
p00nosql01.00])
> Why it's split data from two nodes? we have 6 nodes cassandra cluster + hadoop slaves
-  every task should get local input split from local cassandra - am i right? 
My understanding is that it may get it locally, but it's not something that has to happen.
Once of the Hadoop guys will have a better idea. 

Try reducing the cassandra.range.batch.size and/or if you are using wide rows enable cassandra.input.widerows

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 25/04/2013, at 7:55 PM, Shamim <srecon@yandex.ru> wrote:

> Hello Aaron,
>  I have got the following Log from the server (Sorry for being late)
> 
> job_201304231203_0004
> 	attempt_201304231203_0004_m_000501_0
> 
> 	2013-04-23 16:09:14,196 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop
library
> 2013-04-23 16:09:14,438 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager:
Creating symlink: /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/pigContext
<- /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/pigContext
> 2013-04-23 16:09:14,453 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager:
Creating symlink: /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/dk
<- /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/dk
> 2013-04-23 16:09:14,456 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager:
Creating symlink: /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/META-INF
<- /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/META-INF
> 2013-04-23 16:09:14,459 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager:
Creating symlink: /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/org
<- /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/org
> 2013-04-23 16:09:14,469 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager:
Creating symlink: /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/com
<- /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/com
> 2013-04-23 16:09:14,471 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager:
Creating symlink: /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/.job.jar.crc
<- /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/.job.jar.crc
> 2013-04-23 16:09:14,474 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager:
Creating symlink: /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/job.jar
<- /egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/job.jar
> 2013-04-23 16:09:17,329 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit
code 0
> 2013-04-23 16:09:17,387 INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin
: org.apache.hadoop.util.LinuxResourceCalculatorPlugin@256ef705
> 2013-04-23 16:09:17,838 INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader:
Current split being processed ColumnFamilySplit((9197470410121435301, '-1] @[p00nosql02.00,
p00nosql01.00])
> 2013-04-23 16:09:18,088 INFO org.apache.pig.data.SchemaTupleBackend: Key [pig.schematuple]
was not set... will not generate code.
> 2013-04-23 16:09:19,784 INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map:
Aliases being processed per job phase (AliasName[line,offset]): M: data[12,7],null[-1,-1],filtered[14,11],null[-1,-1],c1[23,5],null[-1,-1],updated[111,10]
C:  R: 
> 2013-04-23 17:35:11,199 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing
logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2013-04-23 17:35:11,384 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache
for UID to User mapping with a cache timeout of 14400 seconds.
> 2013-04-23 17:35:11,385 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName cassandra
for UID 500 from the native implementation
> 2013-04-23 17:35:11,417 WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.RuntimeException: TimedOutException()
>        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
>        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
>        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
>        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.getProgress(PigRecordReader.java:169)
>        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:514)
>        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:539)
>        at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>        at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: TimedOutException()
>        at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12932)
>        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
>        at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734)
>        at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718)
>        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346)
>        ... 17 more
> 2013-04-23 17:35:11,427 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the
task
> 
> These Two tasks hanged for long time and crashes with timeout exception. Very interesting
part is as follows
> 2013-04-23 16:09:17,838 INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader:
Current split being processed ColumnFamilySplit((9197470410121435301, '-1] @[p00nosql02.00,
p00nosql01.00])
> Why it's split data from two nodes? we have 6 nodes cassandra cluster + hadoop slaves
-  every task should get local input split from local cassandra - am i right? 
> 
> -- 
> Best regards
>   Shamim A.
> 
> 24.04.2013, 10:59, "Shamim" <srecon@yandex.ru>:
>> Hello Aron,
>> We have build up our new cluster from the scratch with version 1.2 - partition murmor3.
We are not using vnodes at all.
>> Actually log is clean and nothing serious, now investigating logs and post soon if
found something criminal
>> 
>>>>>  Our cluster is evenly partitioned (Murmur3Partitioner) > > Murmor3Partitioner
is only available in 1.2 and changing partitioners is not supported. Did you change from Random
Partitioner under 1.1? > > Are you using virtual nodes in your 1.2 cluster ? > >>>
We have roughly 97million rows in our cluster. Why we are getting above behavior? Do you have
any suggestion or clue to trouble shoot in this issue? > > Can you make some of the
logs from the tasks available? > > Cheers > > --
>> 
>> --------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand
> > @aaronmorton > http://www.thelastpickle.com > > On 23/04/2013, at 5:50
AM, Shamim  wrote: > >> We are using Hadoop 1.0.3 and pig 0.11.1 version >>
>> -- >> Best regards >> Shamim A. >> >> 22.04.2013, 21:48,
"Shamim" : >> >>> Hello all, >>> recently we have upgrade our cluster
(6 nodes) from cassandra version 1.1.6 to 1.2.1. Our cluster is evenly partitioned (Murmur3Partitioner).
We are using pig for parse and compute aggregate data. >>> >>> When we submit
job through pig, what i consistently see is that, while most of the task have 20-25k row assigned
each (Map input records), only 2 of them (always 2 ) getting more than 2 million rows. This
2 tasks always complete 100% and hang for long time. Also most of the time we are getting
killed task (2%) with TimeoutException. >>> >>> We increased rpc_timeout
to 60000, also set cassandra.input.split.size=1024 but nothing help. >>> >>>
We have roughly 97million rows in our cluster. Why we are getting above behavior? Do you have
any suggestion or clue to trouble shoot in this issue? Any help will be highly thankful. Thankx
in advance. >>> >>> -- >>> Best regards >>> Shamim A.
-- Best regards
>>   Shamim A.


Mime
View raw message