hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yaotian <yaot...@gmail.com>
Subject Re: I am running MapReduce on a 30G data on 1master/2 slave, but failed.
Date Tue, 15 Jan 2013 08:34:27 GMT
I set mapred.reduce.tasks from -1 to "AutoReduce"
And the hadoop created 450 tasks for Map. But 1 task for Reduce. It seems
that this reduce only run on 1 slave (I have two slaves).

But when it was running on 66%, the error report again "Task
attempt_201301150318_0001_r_000000_0 failed to report status for 601
seconds. Killing!"



2013/1/14 yaotian <yaotian@gmail.com>

> How to judge which counter would work?
>
>
> 2013/1/11 <bejoy.hadoop@gmail.com>
>
> **
>> Hi
>>
>> To add on to Harsh's comments.
>>
>> You need not have to change the task time out.
>>
>> In your map/reduce code, you can increment a counter or report status
>> intermediate on intervals so that there is communication from the task and
>> hence won't have a task time out.
>>
>> Every map and reduce task run on its own jvm limited by a jvm size. If
>> you try to holds too much data in memory then it can go beyond the jvm size
>> and cause OOM errors.
>>
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> ------------------------------
>> *From: * yaotian <yaotian@gmail.com>
>> *Date: *Fri, 11 Jan 2013 14:35:07 +0800
>> *To: *<user@hadoop.apache.org>
>> *ReplyTo: * user@hadoop.apache.org
>> *Subject: *Re: I am running MapReduce on a 30G data on 1master/2 slave,
>> but failed.
>>
>> See inline.
>>
>>
>> 2013/1/11 Harsh J <harsh@cloudera.com>
>>
>>> If the per-record processing time is very high, you will need to
>>> periodically report a status. Without a status change report from the task
>>> to the tracker, it will be killed away as a dead task after a default
>>> timeout of 10 minutes (600s).
>>>
>> =====================> Do you mean to increase the report time: "*
>> mapred.task.timeout"*?
>>
>>
>>> Also, beware of holding too much memory in a reduce JVM - you're still
>>> limited there. Best to let the framework do the sort or secondary sort.
>>>
>> =======================>  You mean use the default value ? This is my
>> value.
>> *mapred.job.reduce.memory.mb*-1
>>
>>>
>>>
>>> On Fri, Jan 11, 2013 at 10:58 AM, yaotian <yaotian@gmail.com> wrote:
>>>
>>>> Yes, you are right. The data is GPS trace related to corresponding uid.
>>>> The reduce is doing this: Sort user to get this kind of result: uid, gps1,
>>>> gps2, gps3........
>>>> Yes, the gps data is big because this is 30G data.
>>>>
>>>> How to solve this?
>>>>
>>>>
>>>>
>>>> 2013/1/11 Mahesh Balija <balijamahesh.mca@gmail.com>
>>>>
>>>>> Hi,
>>>>>
>>>>>           2 reducers are successfully completed and 1498 have been
>>>>> killed. I assume that you have the data issues. (Either the data is huge
or
>>>>> some issues with the data you are trying to process)
>>>>>           One possibility could be you have many values associated to
>>>>> a single key, which can cause these kind of issues based on the operation
>>>>> you do in your reducer.
>>>>>           Can you put some logs in your reducer and try to trace out
>>>>> what is happening.
>>>>>
>>>>> Best,
>>>>> Mahesh Balija,
>>>>> Calsoft Labs.
>>>>>
>>>>>
>>>>> On Fri, Jan 11, 2013 at 8:53 AM, yaotian <yaotian@gmail.com> wrote:
>>>>>
>>>>>> I have 1 hadoop master which name node locates and 2 slave which
>>>>>> datanode locate.
>>>>>>
>>>>>> If i choose a small data like 200M, it can be done.
>>>>>>
>>>>>> But if i run 30G data, Map is done. But the reduce report error.
Any
>>>>>> sugggestion?
>>>>>>
>>>>>>
>>>>>> This is the information.
>>>>>>
>>>>>> *Black-listed TaskTrackers:* 1<http://23.20.27.135:9003/jobblacklistedtrackers.jsp?jobid=job_201301090834_0041>
>>>>>> ------------------------------
>>>>>> Kind % CompleteNum Tasks PendingRunningComplete KilledFailed/Killed
>>>>>> Task Attempts<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041>
>>>>>> map<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=map&pagenum=1>
>>>>>> 100.00%4500 0450<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=map&pagenum=1&state=completed>
>>>>>> 00 / 1<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=map&cause=killed>
>>>>>> reduce<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1>
>>>>>> 100.00%1500 0 02<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1&state=completed>
>>>>>> 1498<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1&state=killed>
>>>>>> 12<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=reduce&cause=failed>
>>>>>>  / 3<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=reduce&cause=killed>
>>>>>>
>>>>>>
>>>>>> TaskCompleteStatusStart TimeFinish TimeErrorsCounters
>>>>>> task_201301090834_0041_r_000001<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000001>
>>>>>> 0.00%
>>>>>> 10-Jan-2013 04:18:54
>>>>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 44sec)
>>>>>>
>>>>>> Task attempt_201301090834_0041_r_000001_0 failed to report status
for 600 seconds. Killing!
>>>>>> Task attempt_201301090834_0041_r_000001_1 failed to report status
for 602 seconds. Killing!
>>>>>> Task attempt_201301090834_0041_r_000001_2 failed to report status
for 602 seconds. Killing!
>>>>>> Task attempt_201301090834_0041_r_000001_3 failed to report status
for 602 seconds. Killing!
>>>>>>
>>>>>>
>>>>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000001>
>>>>>> task_201301090834_0041_r_000002<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000002>
>>>>>> 0.00%
>>>>>> 10-Jan-2013 04:18:54
>>>>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 43sec)
>>>>>>
>>>>>> Task attempt_201301090834_0041_r_000002_0 failed to report status
for 601 seconds. Killing!
>>>>>> Task attempt_201301090834_0041_r_000002_1 failed to report status
for 600 seconds. Killing!
>>>>>>
>>>>>>
>>>>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000002>
>>>>>> task_201301090834_0041_r_000003<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000003>
>>>>>> 0.00%
>>>>>> 10-Jan-2013 04:18:57
>>>>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 41sec)
>>>>>>
>>>>>> Task attempt_201301090834_0041_r_000003_0 failed to report status
for 602 seconds. Killing!
>>>>>> Task attempt_201301090834_0041_r_000003_1 failed to report status
for 602 seconds. Killing!
>>>>>> Task attempt_201301090834_0041_r_000003_2 failed to report status
for 602 seconds. Killing!
>>>>>>
>>>>>>
>>>>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000003>
>>>>>> task_201301090834_0041_r_000005<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000005>
>>>>>> 0.00%
>>>>>> 10-Jan-2013 06:11:07
>>>>>> 10-Jan-2013 06:46:38 (35mins, 31sec)
>>>>>>
>>>>>> Task attempt_201301090834_0041_r_000005_0 failed to report status
for 600 seconds. Killing!
>>>>>>
>>>>>>
>>>>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000005>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Mime
View raw message