hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayuresh <mayuresh.kshirsa...@gmail.com>
Subject Re: Debugging killed task attempts
Date Thu, 02 Jun 2011 10:59:13 GMT
Ok. Some more progress. Once thing i noticed is that this machine may not be
holding any data yet:

Name: 10.206.73.198:50010
Decommission Status : Normal
Configured Capacity: 359313149952 (334.64 GB)
DFS Used: 2478612 (2.36 MB)
Non DFS Used: 18468232684 (17.2 GB)
DFS Remaining: 340842438656(317.43 GB)
DFS Used%: 0%
DFS Remaining%: 94.86%
Last contact: Thu Jun 02 10:55:22 UTC 2011


is there a way to force hadoop/hbase to balance some data on to this node? I
tried running hadoop rebalancer, but no effect.

On Thu, Jun 2, 2011 at 3:10 PM, Mayuresh <mayuresh.kshirsagar@gmail.com>wrote:

> Additional information:
>
> I see the following logs in the syslog of the task attempt:
>
> 2011-06-02 09:09:41,145 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 forwarding
10000 rows
> 2011-06-02 09:09:41,145 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding
10000 rows
> 2011-06-02 09:09:41,145 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarding
10000 rows
> 2011-06-02 09:09:41,145 INFO ExecMapper: ExecMapper: processing 10000 rows: used memory
= 110421792
> 2011-06-02 09:12:51,340 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 forwarding
100000 rows
> 2011-06-02 09:12:51,340 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding
100000 rows
> 2011-06-02 09:12:51,340 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarding
100000 rows
> 2011-06-02 09:12:51,341 INFO ExecMapper: ExecMapper: processing 100000 rows: used memory
= 115097528
> 2011-06-02 09:18:15,591 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log)
via org.mortbay.log.Slf4jLog
> 2011-06-02 09:18:15,592 WARN org.mortbay.log: Ignore, probably already closed
> java.io.IOException: The client is stopped
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:822)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:716)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
> 	at $Proxy2.close(Unknown Source)
> 	at org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:105)
> 	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:72)
> 	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:38)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1040)
> 	at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1967)
> 	at org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:2092)
> 	at org.apache.hadoop.hbase.mapred.TableInputFormatBase$TableRecordReader.next(TableInputFormatBase.java:222)
> 	at org.apache.hadoop.hbase.mapred.TableInputFormatBase$TableRecordReader.next(TableInputFormatBase.java:91)
> 	at org.apache.hadoop.hive.ql.io.HiveRecordReader.next(HiveRecordReader.java:67)
> 	at org.apache.hadoop.hive.ql.io.HiveRecordReader.next(HiveRecordReader.java:33)
> 	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
> 	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>
>
> On Thu, Jun 2, 2011 at 3:06 PM, Mayuresh <mayuresh.kshirsagar@gmail.com>wrote:
>
>> Hi,
>>
>> I am trying to scan around 4,600,000 rows of hbase data. I am using hive
>> to query them back. I start the job with around 25 maps and there are 11
>> nodes in my cluster each running 2 maps at a time.
>>
>> I saw that it took around 7 minutes to scan all this data with 7 nodes,
>> However I added 4 more nodes, and it is taking even more time. In the map
>> task which is taking the longest, I see the following:
>>
>> attempt_201106011013_0010_m_000009_0    Task attempt:
>> /default-rack/domU-12-31-39-0F-75-13.compute-1.internal
>> Cleanup Attempt:
>> /default-rack/domU-12-31-39-0F-75-13.compute-1.internal    KILLED    100.00%
>>     2-Jun-2011 08:53:16    2-Jun-2011 09:01:48 (8mins, 32sec)
>>
>> and
>>
>> attempt_201106011013_0010_m_000009_1
>> /default-rack/ip-10-196-198-48.ec2.internal    SUCCEEDED    100.00%
>>     2-Jun-2011 08:57:28    2-Jun-2011 09:01:44 (4mins, 15sec)
>>
>> The first attempt waited for 8mins 32secs before getting killed. I checked
>> datanode logs and all I see over there is some data coming in and some going
>> out. Can someout point me out to exactly how can I debug what exactly was
>> going on, and how can I avoid such long non-useful task attempts from being
>> run?
>>
>> Thanks,
>> -Mayuresh
>>
>
>
>
> --
> -Mayuresh
>



-- 
-Mayuresh

Mime
View raw message