hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang Bingjun (Eddy)" <eddym...@gmail.com>
Subject Re: too many 100% mapper does not complete / finish / commit
Date Mon, 02 Nov 2009 10:40:55 GMT
Hi Pallavi, Khurana, and Vasekar,

Thanks a lot for your reply. To make up, the mapper we are using is the
multithreaded mapper.

To answer your questions:

Pallavi, Khurana: I have checked the logs. The key it got stuck on is the
last key it reads in. Since the progress is 100% I suppose the key is the
last key? From the stdout log of our mapper, we are confirmed that the map
function of the mapper has completed. After that, no more key was read in
and no other progress is made by the mapper, which means it didn't complete
/ commit being 100%. For each job, we have different number of mapper got
stuck. But it is roughly about one third to half mappers. From the stdout
logs of our mapper, we are also confirmed that the map function of the
mapper has finished. That's why we started to suspect the MapReduce
framework has something to do with the stuck problem.

Here is log from the stdout:
[entry] [293419] <track><name>i bealive</name><artist>Simian Mobile
Disco</artist></track>
[0] [293419] start creating objects
[1] [293419] start parsing xml
[2] [293419] start updating data
[sleep] [228312]
[error] [228312] java.io.IOException: [error] [228312] reaches the maximum
number of attempts whiling updating
[3] [228312] start collecting output228312
[3.1 done with null] [228312] done228312
[fail] [228312] java.io.IOException: 3.1 throw null228312
[done] [228312] done228312
[sleep] [293419]
[error] [293419] java.io.IOException: [error] [293419] reaches the maximum
number of attempts whiling updating
[3] [293419] start collecting output293419
[3.1 done with null] [293419] done293419
[fail] [293419] java.io.IOException: 3.1 throw null293419
[done] [293419] done293419

Here is the log from tasktracker:
2009-11-02 16:58:23,518 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: 梟 artist: Plastic Tree
2009-11-02 16:58:50,527 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: Zydeko artist: Cirque du
Soleil
2009-11-02 16:59:23,539 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
www.China.ie
2009-11-02 16:59:50,550 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
www.China.ie
2009-11-02 17:00:11,560 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
Mobile Disco
2009-11-02 17:00:23,565 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
Mobile Disco
2009-11-02 17:01:11,585 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
Mobile Disco

>From these logs, we can see that the last read in entry is "i bealive
artist: Simian Mobile Disco" the last process entry in the mapper is the
same as this entry and from the stdout log, we can see the map function has
finished....

Vasekar: The HDFS is healthy. We didn't store too many small files in it
yet. The return of command "hadoop fsck /" is like follows:
Total size:    89114318394 B (Total open files size: 19845943808 B)
 Total dirs:    430
 Total files:   1761 (Files currently being written: 137)
 Total blocks (validated):      2691 (avg. block size 33115688 B) (Total
open file blocks (not validated): 309)
 Minimally replicated blocks:   2691 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.802304
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          76
 Number of racks:               1

Is this problem possibly due to the stuck communication between the actual
task (the mapper) and the tasktracker? From the logs, we cannot see anything
after the stuck.


fromAmandeep Khurana <amansk@gmail.com>reply-tocommon-user@hadoop.apache.org
tocommon-user@hadoop.apache.org
dateMon, Nov 2, 2009 at 4:36 PMsubjectRe: too many 100% mapper does not
complete / finish / commitmailing list<common-user.hadoop.apache.org> Filter
messages from this mailing listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
from this mailing-list
hide details 4:36 PM (1 hour ago)
Did you try to add any logging and see what keys are they getting stuck on
or whats the last keys it processed? Do the same number of mappers get stuck
every time?

Not having reducers is not a problem. Its pretty normal to do that.

fromAmogh Vasekar <amogh@yahoo-inc.com>reply-tocommon-user@hadoop.apache.org
to"common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
dateMon, Nov 2, 2009 at 4:50 PMsubjectRe: too many 100% mapper does not
complete / finish / commitmailing list<common-user.hadoop.apache.org> Filter
messages from this mailing listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
from this mailing-list
hide details 4:50 PM (1 hour ago)

Hi,
Quick questions...
Are you creating too many small files?
Are there any task side files being created?
Is the heap for NN having enough space to list metadata? Any details on its
general health will probably be helpful to people on the list.

Amogh
Best regards,
Zhang Bingjun (Eddy)

E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
Tel No: +65-96188110 (M)


On Mon, Nov 2, 2009 at 4:51 PM, Palleti, Pallavi <
pallavi.palleti@corp.aol.com> wrote:

> Hi Eddy,
>
> I faced similar issue when I used pig script for fetching webpages for
> certain urls. I could see the map phase showing100% and it is still
> running. As I was logging the page that it is currently fetching, I
> could see the process hasn't yet finished. It might be the same issue.
> So, you can add logging to check whether it is actually stuck or the
> process is still going on.
>
> Thanks
> Pallavi
>
> ________________________________
>
> From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com]
> Sent: Monday, November 02, 2009 2:03 PM
> To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
> mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> Subject: too many 100% mapper does not complete / finish / commit
>
>
> Dear hadoop fellows,
>
> We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In
> this case, we only have mappers to crawl data and save data into HDFS in
> a distributed way. No reducers is specified in the job conf.
>
> The problem is that for every job we have about one third mappers stuck
> with 100% progress but never complete. If we look at the the tasktracker
> log of those mappers, the last log was the key input INFO log line and
> no others logs were output after that.
>
> From the stdout log of a specific attempt of one of those mappers, we
> can see that the map function of the mapper has been finished completely
> and the control of the execution should be somewhere in the MapReduce
> framework part.
>
> Does anyone have any clue about this problem? Is it because we didn't
> use any reducers? Since two thirds of the mappers could complete
> successfully and commit their output data into HDFS, I suspect the stuck
> mappers has something to do with the MapReduce framework code?
>
> Any input will be appreciated. Thanks a lot!
>
> Best regards,
> Zhang Bingjun (Eddy)
>
> E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> Tel No: +65-96188110 (M)
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message