hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: too many 100% mapper does not complete / finish / commit
Date Mon, 02 Nov 2009 10:52:35 GMT
On Mon, Nov 2, 2009 at 2:40 AM, Zhang Bingjun (Eddy) <eddymier@gmail.com>wrote:

> Hi Pallavi, Khurana, and Vasekar,
>
> Thanks a lot for your reply. To make up, the mapper we are using is the
> multithreaded mapper.
>

How are you doing this? Did you your own MapRunnable?


>
> To answer your questions:
>
> Pallavi, Khurana: I have checked the logs. The key it got stuck on is the
> last key it reads in. Since the progress is 100% I suppose the key is the
> last key? From the stdout log of our mapper, we are confirmed that the map
> function of the mapper has completed. After that, no more key was read in
> and no other progress is made by the mapper, which means it didn't complete
> / commit being 100%. For each job, we have different number of mapper got
> stuck. But it is roughly about one third to half mappers. From the stdout
> logs of our mapper, we are also confirmed that the map function of the
> mapper has finished. That's why we started to suspect the MapReduce
> framework has something to do with the stuck problem.
>
> Here is log from the stdout:
> [entry] [293419] <track><name>i bealive</name><artist>Simian
Mobile
> Disco</artist></track>
> [0] [293419] start creating objects
> [1] [293419] start parsing xml
> [2] [293419] start updating data
> [sleep] [228312]
> [error] [228312] java.io.IOException: [error] [228312] reaches the maximum
> number of attempts whiling updating
> [3] [228312] start collecting output228312
> [3.1 done with null] [228312] done228312
> [fail] [228312] java.io.IOException: 3.1 throw null228312
> [done] [228312] done228312
> [sleep] [293419]
> [error] [293419] java.io.IOException: [error] [293419] reaches the maximum
> number of attempts whiling updating
> [3] [293419] start collecting output293419
> [3.1 done with null] [293419] done293419
> [fail] [293419] java.io.IOException: 3.1 throw null293419
> [done] [293419] done293419
>
> Here is the log from tasktracker:
> 2009-11-02 16:58:23,518 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200911021416_0001_m_000047_1 1.0% name: 梟 artist: Plastic Tree
> 2009-11-02 16:58:50,527 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200911021416_0001_m_000047_1 1.0% name: Zydeko artist: Cirque du
> Soleil
> 2009-11-02 16:59:23,539 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
> www.China.ie
> 2009-11-02 16:59:50,550 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
> www.China.ie
> 2009-11-02 17:00:11,560 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
> Mobile Disco
> 2009-11-02 17:00:23,565 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
> Mobile Disco
> 2009-11-02 17:01:11,585 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
> Mobile Disco
>
> From these logs, we can see that the last read in entry is "i bealive
> artist: Simian Mobile Disco" the last process entry in the mapper is the
> same as this entry and from the stdout log, we can see the map function has
> finished....
>

Put some stdout or logging code towards the end of the mapper and also check
if all threads are coming back. Do you think it could be some issue with the
threads?


> Vasekar: The HDFS is healthy. We didn't store too many small files in it
> yet. The return of command "hadoop fsck /" is like follows:
> Total size:    89114318394 B (Total open files size: 19845943808 B)
>  Total dirs:    430
>  Total files:   1761 (Files currently being written: 137)
>  Total blocks (validated):      2691 (avg. block size 33115688 B) (Total
> open file blocks (not validated): 309)
>  Minimally replicated blocks:   2691 (100.0 %)
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       0 (0.0 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:     2.802304
>  Corrupt blocks:                0
>  Missing replicas:              0 (0.0 %)
>  Number of data-nodes:          76
>  Number of racks:               1
>
> Is this problem possibly due to the stuck communication between the actual
> task (the mapper) and the tasktracker? From the logs, we cannot see
> anything
> after the stuck.
>

The TT and JT logs would show if there is a lost communication. Enable DEBUG
logging for the processes and keep a tab.


>
>
> fromAmandeep Khurana <amansk@gmail.com>
> reply-tocommon-user@hadoop.apache.org
> tocommon-user@hadoop.apache.org
> dateMon, Nov 2, 2009 at 4:36 PMsubjectRe: too many 100% mapper does not
> complete / finish / commitmailing list<common-user.hadoop.apache.org>
> Filter
> messages from this mailing
> listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
> from this mailing-list
> hide details 4:36 PM (1 hour ago)
> Did you try to add any logging and see what keys are they getting stuck on
> or whats the last keys it processed? Do the same number of mappers get
> stuck
> every time?
>
> Not having reducers is not a problem. Its pretty normal to do that.
>
> fromAmogh Vasekar <amogh@yahoo-inc.com>
> reply-tocommon-user@hadoop.apache.org
> to"common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
> dateMon, Nov 2, 2009 at 4:50 PMsubjectRe: too many 100% mapper does not
> complete / finish / commitmailing list<common-user.hadoop.apache.org>
> Filter
> messages from this mailing
> listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
> from this mailing-list
> hide details 4:50 PM (1 hour ago)
>
> Hi,
> Quick questions...
> Are you creating too many small files?
> Are there any task side files being created?
> Is the heap for NN having enough space to list metadata? Any details on its
> general health will probably be helpful to people on the list.
>
> Amogh
> Best regards,
> Zhang Bingjun (Eddy)
>
> E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> Tel No: +65-96188110 (M)
>
>
> On Mon, Nov 2, 2009 at 4:51 PM, Palleti, Pallavi <
> pallavi.palleti@corp.aol.com> wrote:
>
> > Hi Eddy,
> >
> > I faced similar issue when I used pig script for fetching webpages for
> > certain urls. I could see the map phase showing100% and it is still
> > running. As I was logging the page that it is currently fetching, I
> > could see the process hasn't yet finished. It might be the same issue.
> > So, you can add logging to check whether it is actually stuck or the
> > process is still going on.
> >
> > Thanks
> > Pallavi
> >
> > ________________________________
> >
> > From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com]
> > Sent: Monday, November 02, 2009 2:03 PM
> > To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
> > mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> > Subject: too many 100% mapper does not complete / finish / commit
> >
> >
> > Dear hadoop fellows,
> >
> > We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In
> > this case, we only have mappers to crawl data and save data into HDFS in
> > a distributed way. No reducers is specified in the job conf.
> >
> > The problem is that for every job we have about one third mappers stuck
> > with 100% progress but never complete. If we look at the the tasktracker
> > log of those mappers, the last log was the key input INFO log line and
> > no others logs were output after that.
> >
> > From the stdout log of a specific attempt of one of those mappers, we
> > can see that the map function of the mapper has been finished completely
> > and the control of the execution should be somewhere in the MapReduce
> > framework part.
> >
> > Does anyone have any clue about this problem? Is it because we didn't
> > use any reducers? Since two thirds of the mappers could complete
> > successfully and commit their output data into HDFS, I suspect the stuck
> > mappers has something to do with the MapReduce framework code?
> >
> > Any input will be appreciated. Thanks a lot!
> >
> > Best regards,
> > Zhang Bingjun (Eddy)
> >
> > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> > Tel No: +65-96188110 (M)
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message