hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang Bingjun (Eddy)" <eddym...@gmail.com>
Subject Re: too many 100% mapper does not complete / finish / commit
Date Mon, 02 Nov 2009 11:15:18 GMT
Dear Khurana,

We didn't use MapRunnable. In stead, we used directly the package
org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper and passed our
normal Mapper Class to it using its getMapperClass() interface. We set the
number of threads using its setNumberOfThreads(). Is this one correct way of
doing multithreaded mapper?

We noticed in hadoop-0.20.1 there is another
MultithreadedMapper, org.apache.hadoop.mapred.lib.map.MultithreadedMapper,
but we didn't touch it.

It might be the reason that some thread didn't return. We need to do some
work to confirm that. We will also try to enable DEBUG mode of hadoop. Could
you share some info on starting an hadoop deamon or the whole hadoop cluster
in debug mode?

Thanks a lot!

Best regards,
Zhang Bingjun (Eddy)

E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
Tel No: +65-96188110 (M)


On Mon, Nov 2, 2009 at 6:58 PM, Zhang Bingjun (Eddy) <eddymier@gmail.com>wrote:

> Hi all,
>
> An important observation. The 100% mapper without completion all have
> temporary files of 64MB exactly, which means the output of the mapper is cut
> off at the block boundary. However, we do have some successfully completed
> mappers having output files larger than 64MB and we also have less than 100%
> mappers have temporary files larger than 64MB.
>
> Here is the info returned by "hadoop fs -ls
> /hadoop/music/track/audio/track_1/_temporary/_attempt_200911021416_0001_m_000091_0
> -rw-r--r--   3 hadoop supergroup   67108864 2009-11-02 14:29
> /hadoop/music/track/audio/track_1/_temporary/_attempt_200911021416_0001_m_000091_0/part-m-00091
>
> This is the temporary file of a 100% mapper without completion.
>
> Any clues on this?
>
> Best regards,
> Zhang Bingjun (Eddy)
>
> E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> Tel No: +65-96188110 (M)
>
>
> On Mon, Nov 2, 2009 at 6:52 PM, Amandeep Khurana <amansk@gmail.com> wrote:
>
>> On Mon, Nov 2, 2009 at 2:40 AM, Zhang Bingjun (Eddy) <eddymier@gmail.com
>> >wrote:
>>
>> > Hi Pallavi, Khurana, and Vasekar,
>> >
>> > Thanks a lot for your reply. To make up, the mapper we are using is the
>> > multithreaded mapper.
>> >
>>
>> How are you doing this? Did you your own MapRunnable?
>>
>>
>
>> >
>> > To answer your questions:
>> >
>> > Pallavi, Khurana: I have checked the logs. The key it got stuck on is
>> the
>> > last key it reads in. Since the progress is 100% I suppose the key is
>> the
>> > last key? From the stdout log of our mapper, we are confirmed that the
>> map
>> > function of the mapper has completed. After that, no more key was read
>> in
>> > and no other progress is made by the mapper, which means it didn't
>> complete
>> > / commit being 100%. For each job, we have different number of mapper
>> got
>> > stuck. But it is roughly about one third to half mappers. From the
>> stdout
>> > logs of our mapper, we are also confirmed that the map function of the
>> > mapper has finished. That's why we started to suspect the MapReduce
>> > framework has something to do with the stuck problem.
>> >
>> > Here is log from the stdout:
>> > [entry] [293419] <track><name>i bealive</name><artist>Simian
Mobile
>> > Disco</artist></track>
>> > [0] [293419] start creating objects
>> > [1] [293419] start parsing xml
>> > [2] [293419] start updating data
>> > [sleep] [228312]
>> > [error] [228312] java.io.IOException: [error] [228312] reaches the
>> maximum
>> > number of attempts whiling updating
>> > [3] [228312] start collecting output228312
>> > [3.1 done with null] [228312] done228312
>> > [fail] [228312] java.io.IOException: 3.1 throw null228312
>> > [done] [228312] done228312
>> > [sleep] [293419]
>> > [error] [293419] java.io.IOException: [error] [293419] reaches the
>> maximum
>> > number of attempts whiling updating
>> > [3] [293419] start collecting output293419
>> > [3.1 done with null] [293419] done293419
>> > [fail] [293419] java.io.IOException: 3.1 throw null293419
>> > [done] [293419] done293419
>> >
>> > Here is the log from tasktracker:
>> > 2009-11-02 16:58:23,518 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200911021416_0001_m_000047_1 1.0% name: 梟 artist: Plastic Tree
>> > 2009-11-02 16:58:50,527 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200911021416_0001_m_000047_1 1.0% name: Zydeko artist: Cirque du
>> > Soleil
>> > 2009-11-02 16:59:23,539 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
>> > www.China.ie
>> > 2009-11-02 16:59:50,550 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
>> > www.China.ie
>> > 2009-11-02 17:00:11,560 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
>> > Mobile Disco
>> > 2009-11-02 17:00:23,565 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
>> > Mobile Disco
>> > 2009-11-02 17:01:11,585 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
>> > Mobile Disco
>> >
>> > From these logs, we can see that the last read in entry is "i bealive
>> > artist: Simian Mobile Disco" the last process entry in the mapper is the
>> > same as this entry and from the stdout log, we can see the map function
>> has
>> > finished....
>> >
>>
>> Put some stdout or logging code towards the end of the mapper and also
>> check
>> if all threads are coming back. Do you think it could be some issue with
>> the
>> threads?
>>
>>
>> > Vasekar: The HDFS is healthy. We didn't store too many small files in it
>> > yet. The return of command "hadoop fsck /" is like follows:
>> > Total size:    89114318394 B (Total open files size: 19845943808 B)
>> >  Total dirs:    430
>> >  Total files:   1761 (Files currently being written: 137)
>> >  Total blocks (validated):      2691 (avg. block size 33115688 B) (Total
>> > open file blocks (not validated): 309)
>> >  Minimally replicated blocks:   2691 (100.0 %)
>> >  Over-replicated blocks:        0 (0.0 %)
>> >  Under-replicated blocks:       0 (0.0 %)
>> >  Mis-replicated blocks:         0 (0.0 %)
>> >  Default replication factor:    3
>> >  Average block replication:     2.802304
>> >  Corrupt blocks:                0
>> >  Missing replicas:              0 (0.0 %)
>> >  Number of data-nodes:          76
>> >  Number of racks:               1
>> >
>> > Is this problem possibly due to the stuck communication between the
>> actual
>> > task (the mapper) and the tasktracker? From the logs, we cannot see
>> > anything
>> > after the stuck.
>> >
>>
>> The TT and JT logs would show if there is a lost communication. Enable
>> DEBUG
>> logging for the processes and keep a tab.
>>
>>
>> >
>> >
>> > fromAmandeep Khurana <amansk@gmail.com>
>> > reply-tocommon-user@hadoop.apache.org
>> > tocommon-user@hadoop.apache.org
>> > dateMon, Nov 2, 2009 at 4:36 PMsubjectRe: too many 100% mapper does not
>> > complete / finish / commitmailing list<common-user.hadoop.apache.org>
>> > Filter
>> > messages from this mailing
>> > listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
>> > from this mailing-list
>> > hide details 4:36 PM (1 hour ago)
>> > Did you try to add any logging and see what keys are they getting stuck
>> on
>> > or whats the last keys it processed? Do the same number of mappers get
>> > stuck
>> > every time?
>> >
>> > Not having reducers is not a problem. Its pretty normal to do that.
>> >
>> > fromAmogh Vasekar <amogh@yahoo-inc.com>
>> > reply-tocommon-user@hadoop.apache.org
>> > to"common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
>> > dateMon, Nov 2, 2009 at 4:50 PMsubjectRe: too many 100% mapper does not
>> > complete / finish / commitmailing list<common-user.hadoop.apache.org>
>> > Filter
>> > messages from this mailing
>> > listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
>> > from this mailing-list
>> > hide details 4:50 PM (1 hour ago)
>> >
>> > Hi,
>> > Quick questions...
>> > Are you creating too many small files?
>> > Are there any task side files being created?
>> > Is the heap for NN having enough space to list metadata? Any details on
>> its
>> > general health will probably be helpful to people on the list.
>> >
>> > Amogh
>> > Best regards,
>> > Zhang Bingjun (Eddy)
>> >
>> > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
>> > Tel No: +65-96188110 (M)
>> >
>> >
>> > On Mon, Nov 2, 2009 at 4:51 PM, Palleti, Pallavi <
>> > pallavi.palleti@corp.aol.com> wrote:
>> >
>> > > Hi Eddy,
>> > >
>> > > I faced similar issue when I used pig script for fetching webpages for
>> > > certain urls. I could see the map phase showing100% and it is still
>> > > running. As I was logging the page that it is currently fetching, I
>> > > could see the process hasn't yet finished. It might be the same issue.
>> > > So, you can add logging to check whether it is actually stuck or the
>> > > process is still going on.
>> > >
>> > > Thanks
>> > > Pallavi
>> > >
>> > > ________________________________
>> > >
>> > > From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com]
>> > > Sent: Monday, November 02, 2009 2:03 PM
>> > > To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
>> > > mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
>> > > Subject: too many 100% mapper does not complete / finish / commit
>> > >
>> > >
>> > > Dear hadoop fellows,
>> > >
>> > > We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In
>> > > this case, we only have mappers to crawl data and save data into HDFS
>> in
>> > > a distributed way. No reducers is specified in the job conf.
>> > >
>> > > The problem is that for every job we have about one third mappers
>> stuck
>> > > with 100% progress but never complete. If we look at the the
>> tasktracker
>> > > log of those mappers, the last log was the key input INFO log line and
>> > > no others logs were output after that.
>> > >
>> > > From the stdout log of a specific attempt of one of those mappers, we
>> > > can see that the map function of the mapper has been finished
>> completely
>> > > and the control of the execution should be somewhere in the MapReduce
>> > > framework part.
>> > >
>> > > Does anyone have any clue about this problem? Is it because we didn't
>> > > use any reducers? Since two thirds of the mappers could complete
>> > > successfully and commit their output data into HDFS, I suspect the
>> stuck
>> > > mappers has something to do with the MapReduce framework code?
>> > >
>> > > Any input will be appreciated. Thanks a lot!
>> > >
>> > > Best regards,
>> > > Zhang Bingjun (Eddy)
>> > >
>> > > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg,
>> bingjun@comp.nus.edu.sg
>> > > Tel No: +65-96188110 (M)
>> > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message