hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daning Wang <dan...@netseer.com>
Subject Re: Hadoop cluster hangs on big hive job
Date Mon, 11 Mar 2013 20:32:37 GMT
[hive@mr3-033 ~]$ hadoop version
Hadoop 1.0.4
Subversion
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
1393290
Compiled by hortonfo on Wed Oct  3 05:13:58 UTC 2012


On Sun, Mar 10, 2013 at 8:16 AM, Suresh Srinivas <suresh@hortonworks.com>wrote:

> What is the version of hadoop?
>
> Sent from phone
>
> On Mar 7, 2013, at 11:53 AM, Daning Wang <daning@netseer.com> wrote:
>
> We have hive query processing zipped csv files. the query was scanning for
> 10 days(partitioned by date). data for each day around 130G. The problem is
> not consistent since if you run it again, it might go through. but the
> problem has never happened on the smaller jobs(like processing only one
> days data).
>
> We don't have space issue.
>
> I have attached log file when problem happening. it is stuck like
> following(just search "19706 of 49964")
>
> 2013-03-05 15:13:51,587 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000019_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:51,811 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000039_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:52,551 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000032_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:52,760 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000000_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:52,946 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000024_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:54,742 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000008_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
>
> Thanks,
>
> Daning
>
>
> On Thu, Mar 7, 2013 at 12:21 AM, Håvard Wahl Kongsgård <
> haavard.kongsgaard@gmail.com> wrote:
>
>> hadoop logs?
>> On 6. mars 2013 21:04, "Daning Wang" <daning@netseer.com> wrote:
>>
>>> We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while
>>> running big jobs. Basically all the nodes are dead, from that
>>> trasktracker's log looks it went into some kinds of loop forever.
>>>
>>> All the log entries like this when problem happened.
>>>
>>> Any idea how to debug the issue?
>>>
>>> Thanks in advance.
>>>
>>>
>>> 2013-03-05 15:13:19,526 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000012_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:19,552 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000028_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:20,858 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000036_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:21,141 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000016_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:21,486 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000019_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:21,692 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000039_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:22,448 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000032_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:22,643 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000000_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:22,840 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000024_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:24,628 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000008_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:24,723 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000039_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:25,336 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000004_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:25,539 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000043_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:25,545 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000012_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:25,569 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000028_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:25,855 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000024_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:26,876 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000036_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:27,159 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000016_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:27,505 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000019_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:28,464 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000032_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:28,553 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000043_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:28,561 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000012_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:28,659 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000000_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:30,519 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000019_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:30,644 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000008_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:30,741 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000039_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:31,369 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000004_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:31,675 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000000_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:31,875 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000024_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:32,372 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000028_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>> 2013-03-05 15:13:32,893 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_000036_0 0.131468% reduce > copy (19706 of
>>> 49964 at 0.00 MB/s) >
>>>
>>>
> <hadoop-hadoop3-tasktracker.log.gz>
>
>

Mime
View raw message