hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: I am running MapReduce on a 30G data on 1master/2 slave, but failed.
Date Fri, 11 Jan 2013 06:13:50 GMT
If the per-record processing time is very high, you will need to
periodically report a status. Without a status change report from the task
to the tracker, it will be killed away as a dead task after a default
timeout of 10 minutes (600s).

Also, beware of holding too much memory in a reduce JVM - you're still
limited there. Best to let the framework do the sort or secondary sort.


On Fri, Jan 11, 2013 at 10:58 AM, yaotian <yaotian@gmail.com> wrote:

> Yes, you are right. The data is GPS trace related to corresponding uid.
> The reduce is doing this: Sort user to get this kind of result: uid, gps1,
> gps2, gps3........
> Yes, the gps data is big because this is 30G data.
>
> How to solve this?
>
>
>
> 2013/1/11 Mahesh Balija <balijamahesh.mca@gmail.com>
>
>> Hi,
>>
>>           2 reducers are successfully completed and 1498 have been
>> killed. I assume that you have the data issues. (Either the data is huge or
>> some issues with the data you are trying to process)
>>           One possibility could be you have many values associated to a
>> single key, which can cause these kind of issues based on the operation you
>> do in your reducer.
>>           Can you put some logs in your reducer and try to trace out what
>> is happening.
>>
>> Best,
>> Mahesh Balija,
>> Calsoft Labs.
>>
>>
>> On Fri, Jan 11, 2013 at 8:53 AM, yaotian <yaotian@gmail.com> wrote:
>>
>>> I have 1 hadoop master which name node locates and 2 slave which
>>> datanode locate.
>>>
>>> If i choose a small data like 200M, it can be done.
>>>
>>> But if i run 30G data, Map is done. But the reduce report error. Any
>>> sugggestion?
>>>
>>>
>>> This is the information.
>>>
>>> *Black-listed TaskTrackers:* 1<http://23.20.27.135:9003/jobblacklistedtrackers.jsp?jobid=job_201301090834_0041>
>>> ------------------------------
>>> Kind % CompleteNum Tasks PendingRunningComplete KilledFailed/Killed
>>> Task Attempts<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041>
>>> map<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=map&pagenum=1>
>>> 100.00%4500 0450<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=map&pagenum=1&state=completed>
>>> 00 / 1<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=map&cause=killed>
>>> reduce<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1>
>>> 100.00%1500 0 02<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1&state=completed>
>>> 1498<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1&state=killed>
>>> 12<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=reduce&cause=failed>
>>>  / 3<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=reduce&cause=killed>
>>>
>>>
>>> TaskCompleteStatusStart TimeFinish TimeErrorsCounters
>>> task_201301090834_0041_r_000001<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000001>
>>> 0.00%
>>> 10-Jan-2013 04:18:54
>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 44sec)
>>>
>>> Task attempt_201301090834_0041_r_000001_0 failed to report status for 600 seconds.
Killing!
>>> Task attempt_201301090834_0041_r_000001_1 failed to report status for 602 seconds.
Killing!
>>> Task attempt_201301090834_0041_r_000001_2 failed to report status for 602 seconds.
Killing!
>>> Task attempt_201301090834_0041_r_000001_3 failed to report status for 602 seconds.
Killing!
>>>
>>>
>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000001>
>>> task_201301090834_0041_r_000002<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000002>
>>> 0.00%
>>> 10-Jan-2013 04:18:54
>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 43sec)
>>>
>>> Task attempt_201301090834_0041_r_000002_0 failed to report status for 601 seconds.
Killing!
>>> Task attempt_201301090834_0041_r_000002_1 failed to report status for 600 seconds.
Killing!
>>>
>>>
>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000002>
>>> task_201301090834_0041_r_000003<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000003>
>>> 0.00%
>>> 10-Jan-2013 04:18:57
>>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 41sec)
>>>
>>> Task attempt_201301090834_0041_r_000003_0 failed to report status for 602 seconds.
Killing!
>>> Task attempt_201301090834_0041_r_000003_1 failed to report status for 602 seconds.
Killing!
>>> Task attempt_201301090834_0041_r_000003_2 failed to report status for 602 seconds.
Killing!
>>>
>>>
>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000003>
>>> task_201301090834_0041_r_000005<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000005>
>>> 0.00%
>>> 10-Jan-2013 06:11:07
>>> 10-Jan-2013 06:46:38 (35mins, 31sec)
>>>
>>>
>>> Task attempt_201301090834_0041_r_000005_0 failed to report status for 600 seconds.
Killing!
>>>
>>>
>>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000005>
>>>
>>
>>
>


-- 
Harsh J

Mime
View raw message