hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suresh Srinivas <sur...@hortonworks.com>
Subject Re: Hadoop cluster hangs on big hive job
Date Sun, 10 Mar 2013 15:16:31 GMT
What is the version of hadoop?

Sent from phone

On Mar 7, 2013, at 11:53 AM, Daning Wang <daning@netseer.com> wrote:

> We have hive query processing zipped csv files. the query was scanning for 10 days(partitioned
by date). data for each day around 130G. The problem is not consistent since if you run it
again, it might go through. but the problem has never happened on the smaller jobs(like processing
only one days data).
> 
> We don't have space issue.
> 
> I have attached log file when problem happening. it is stuck like following(just search
"19706 of 49964")
> 
> 2013-03-05 15:13:51,587 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000019_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:51,811 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000039_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:52,551 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000032_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:52,760 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000000_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:52,946 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000024_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:54,742 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000008_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) >
> 
> Thanks,
> 
> Daning
> 
> 
> On Thu, Mar 7, 2013 at 12:21 AM, Håvard Wahl Kongsgård <haavard.kongsgaard@gmail.com>
wrote:
>> hadoop logs?
>> 
>> On 6. mars 2013 21:04, "Daning Wang" <daning@netseer.com> wrote:
>>> We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while running
big jobs. Basically all the nodes are dead, from that trasktracker's log looks it went into
some kinds of loop forever.
>>> 
>>> All the log entries like this when problem happened.
>>> 
>>> Any idea how to debug the issue?
>>> 
>>> Thanks in advance.
>>> 
>>> 
>>> 2013-03-05 15:13:19,526 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000012_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:19,552 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000028_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:20,858 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000036_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:21,141 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000016_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:21,486 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000019_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:21,692 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000039_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:22,448 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000032_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:22,643 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000000_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:22,840 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000024_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:24,628 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000008_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:24,723 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000039_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:25,336 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000004_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:25,539 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000043_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:25,545 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000012_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:25,569 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000028_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:25,855 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000024_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:26,876 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000036_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:27,159 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000016_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:27,505 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000019_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:28,464 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000032_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:28,553 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000043_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:28,561 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000012_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:28,659 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000000_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:30,519 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000019_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:30,644 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000008_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:30,741 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000039_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:31,369 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000004_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:31,675 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000000_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:31,875 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000024_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:32,372 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000028_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
>>> 2013-03-05 15:13:32,893 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201302270947_0010_r_000036_0
0.131468% reduce > copy (19706 of 49964 at 0.00 MB/s) > 
> 
> <hadoop-hadoop3-tasktracker.log.gz>

Mime
View raw message