hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hardik Pandya <smarty.ju...@gmail.com>
Subject Re: Map succeeds but reduce hangs
Date Wed, 01 Jan 2014 21:54:43 GMT
do you have your hosnames properly configured in etc/hosts? have you tried
192.168.?.? instead of localhost 127.0.0.1



On Wed, Jan 1, 2014 at 11:33 AM, navaz <navaz.enc@gmail.com> wrote:

> Thanks. But I wonder Why map succeeds 100% , How it resolve hostname ?
>
> Now reduce becomes 100% but bailing out slave2 and slave 3 . ( But Mappig
> is succeded for these nodes).
>
> Does it looks for hostname only for reduce ?
>
>
> 14/01/01 09:09:38 INFO mapred.JobClient: Running job: job_201401010908_0001
> 14/01/01 09:09:39 INFO mapred.JobClient:  map 0% reduce 0%
> 14/01/01 09:10:00 INFO mapred.JobClient:  map 33% reduce 0%
> 14/01/01 09:10:01 INFO mapred.JobClient:  map 66% reduce 0%
> 14/01/01 09:10:05 INFO mapred.JobClient:  map 100% reduce 0%
> 14/01/01 09:10:14 INFO mapred.JobClient:  map 100% reduce 22%
> 14/01/01 09:17:32 INFO mapred.JobClient:  map 100% reduce 0%
> 14/01/01 09:17:35 INFO mapred.JobClient: Task Id :
> attempt_201401010908_0001_r_000000_0, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 14/01/01 09:17:46 INFO mapred.JobClient:  map 100% reduce 11%
> 14/01/01 09:17:50 INFO mapred.JobClient:  map 100% reduce 22%
> 14/01/01 09:25:06 INFO mapred.JobClient:  map 100% reduce 0%
> 14/01/01 09:25:10 INFO mapred.JobClient: Task Id :
> attempt_201401010908_0001_r_000000_1, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 14/01/01 09:25:34 INFO mapred.JobClient:  map 100% reduce 100%
> 14/01/01 09:25:42 INFO mapred.JobClient: Job complete:
> job_201401010908_0001
> 14/01/01 09:25:42 INFO mapred.JobClient: Counters: 29
>
>
>
> Job Tracker logs:
> 2014-01-01 09:09:59,874 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201401010908_0001_m_000002_0' has completed task_20140
> 1010908_0001_m_000002 successfully.
> 2014-01-01 09:10:04,231 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201401010908_0001_m_000001_0' has completed task_20140
> 1010908_0001_m_000001 successfully.
> 2014-01-01 09:17:30,527 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201401010908_0001_r_000000_0: Shuffle Error: Exc
> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 2014-01-01 09:17:30,528 INFO org.apache.hadoop.mapred.JobTracker: Removing
> task 'attempt_201401010908_0001_r_000000_0'
> 2014-01-01 09:17:30,529 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (TASK_CLEANUP) 'attempt_201401010908_0001_r_000000_0' to ti
> p task_201401010908_0001_r_000000, for tracker 'tracker_slave3:localhost/
> 127.0.0.1:44663'
> 2014-01-01 09:17:35,130 INFO org.apache.hadoop.mapred.JobTracker: Removing
> task 'attempt_201401010908_0001_r_000000_0'
> 2014-01-01 09:17:35,213 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (REDUCE) 'attempt_201401010908_0001_r_000000_1' to tip task
> _201401010908_0001_r_000000, for tracker 'tracker_slave2:localhost/
> 127.0.0.1:51438'
> 2014-01-01 09:25:05,493 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201401010908_0001_r_000000_1: Shuffle Error: Exc
> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 2014-01-01 09:25:05,493 INFO org.apache.hadoop.mapred.JobTracker: Removing
> task 'attempt_201401010908_0001_r_000000_1'
> 2014-01-01 09:25:05,494 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (TASK_CLEANUP) 'attempt_201401010908_0001_r_000000_1' to ti
> p task_201401010908_0001_r_000000, for tracker 'tracker_slave2:localhost/
> 127.0.0.1:51438'
> 2014-01-01 09:25:10,087 INFO org.apache.hadoop.mapred.JobTracker: Removing
> task 'attempt_201401010908_0001_r_000000_1'
> 2014-01-01 09:25:10,109 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (REDUCE) 'attempt_201401010908_0001_r_000000_2' to tip task
> _201401010908_0001_r_000000, for tracker 'tracker_master:localhost/
> 127.0.0.1:57156'
> 2014-01-01 09:25:33,340 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201401010908_0001_r_000000_2' has completed task_20140
> 1010908_0001_r_000000 successfully.
> 2014-01-01 09:25:33,462 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (JOB_CLEANUP) 'attempt_201401010908_0001_m_000003_0' to tip
>  task_201401010908_0001_m_000003, for tracker 'tracker_master:localhost/
> 127.0.0.1:57156'
> 2014-01-01 09:25:42,304 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201401010908_0001_m_000003_0' has completed task_20140
> 1010908_0001_m_000003 successfully.
>
>
> On Tue, Dec 31, 2013 at 4:56 PM, Hardik Pandya <smarty.juice@gmail.com>wrote:
>
>> as expected, its failing during shuffle
>>
>> it seems like hdfs could not resolve the DNS name for slave nodes
>>
>> have your configured your slaves host names correctly?
>>
>> 2013-12-31 14:27:54,207 INFO org.apache.hadoop.mapred.TaskInProgress:
>> Error from attempt_201312311107_0003_r_000000_0: Shuffle Error: Exc
>> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
>> 2013-12-31 14:27:54,208 INFO org.apache.hadoop.mapred.JobTracker:
>> Removing task 'attempt_201312311107_0003_r_000000_0'
>> 2013-12-31 14:27:54,209 INFO org.apache.hadoop.mapred.JobTracker: Adding
>> task (TASK_CLEANUP) 'attempt_201312311107_0003_r_000000_0' to ti
>> p task_201312311107_0003_r_000000, for tracker 'tracker_slave2:localhost/
>> 127.0.0.1:52677'
>> 2013-12-31 14:27:58,797 INFO org.apache.hadoop.mapred.JobTracker:
>> Removing task 'attempt_201312311107_0003_r_000000_0'
>> 2013-12-31 14:27:58,815 INFO org.apache.hadoop.mapred.JobTracker: Adding
>> task (REDUCE) 'attempt_201312311107_0003_r_000000_1' to tip task
>> _201312311107_0003_r_000000, for tracker 'tracker_slave1:localhost/
>> 127.0.0.1:57492'
>>
>>
>>
>>
>> On Tue, Dec 31, 2013 at 4:42 PM, navaz <navaz.enc@gmail.com> wrote:
>>
>>> Hi
>>>
>>> My hdfs-site is configured for 4 nodes. ( One is master and 3 slaves)
>>>
>>> <property>
>>>  <name>dfs.replication</name>
>>>  <value>4</value>
>>>
>>> start-dfs.sh and stop-mapred.sh doesnt solve the problem.
>>>
>>> Also tried to run the program after formatting the namenode(Master)
>>> which also fails.
>>>
>>> My jobtracker logs on the master ( name node) is give below.
>>>
>>>
>>>
>>> 2013-12-31 14:27:35,534 INFO org.apache.hadoop.mapred.JobInProgress:
>>> job_201312311107_0004: nMaps=3 nReduces=1 max=-1
>>> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.JobTracker: Job
>>> job_201312311107_0004 added successfully for user 'hduser' to queue
>>>  'default'
>>> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.AuditLogger:
>>> USER=hduser  IP=155.98.39.28 OPERATION=SUBMIT_JOB    TARGET=job_201312
>>> 311107_0004     RESULT=SUCCESS
>>> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.JobTracker:
>>> Initializing job_201312311107_0004
>>> 2013-12-31 14:27:35,595 INFO org.apache.hadoop.mapred.JobInProgress:
>>> Initializing job_201312311107_0004
>>> 2013-12-31 14:27:35,785 INFO org.apache.hadoop.mapred.JobInProgress:
>>> jobToken generated and stored with users keys in /app/hadoop/tmp/map
>>> red/system/job_201312311107_0004/jobToken
>>> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress:
>>> Input size for job job_201312311107_0004 = 3671523. Number of splits
>>>  = 3
>>> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress:
>>> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/
>>> master
>>> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress:
>>> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/
>>> slave2
>>> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress:
>>> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/
>>> slave1
>>> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress:
>>> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/
>>> slave3
>>> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress:
>>> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/
>>> master
>>> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress:
>>> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/
>>> slave1
>>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress:
>>> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/
>>> slave3
>>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress:
>>> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/
>>> slave2
>>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress:
>>> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/
>>> master
>>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress:
>>> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/
>>> slave1
>>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress:
>>> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/
>>> slave2
>>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress:
>>> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/
>>> slave3
>>> 2013-12-31 14:27:35,798 INFO org.apache.hadoop.mapred.JobInProgress:
>>> job_201312311107_0004 LOCALITY_WAIT_FACTOR=1.0
>>> 2013-12-31 14:27:35,798 INFO org.apache.hadoop.mapred.JobInProgress: Job
>>> job_201312311107_0004 initialized successfully with 3 map tasks
>>> and 1 reduce tasks.
>>> 2013-12-31 14:27:35,913 INFO org.apache.hadoop.mapred.JobTracker: Adding
>>> task (JOB_SETUP) 'attempt_201312311107_0004_m_000004_0' to tip t
>>> ask_201312311107_0004_m_000004, for tracker 'tracker_slave1:localhost/
>>> 127.0.0.1:57492'
>>> 2013-12-31 14:27:40,876 INFO org.apache.hadoop.mapred.JobInProgress:
>>> Task 'attempt_201312311107_0004_m_000004_0' has completed task_20131
>>> 2311107_0004_m_000004 successfully.
>>> 2013-12-31 14:27:40,878 INFO org.apache.hadoop.mapred.JobTracker: Adding
>>> task (MAP) 'attempt_201312311107_0004_m_000000_0' to tip task_20
>>> 1312311107_0004_m_000000, for tracker 'tracker_slave1:localhost/
>>> 127.0.0.1:57492'
>>> 2013-12-31 14:27:40,878 INFO org.apache.hadoop.mapred.JobInProgress:
>>> Choosing data-local task task_201312311107_0004_m_000000
>>> 2013-12-31 14:27:40,907 INFO org.apache.hadoop.mapred.JobTracker: Adding
>>> task (MAP) 'attempt_201312311107_0004_m_000001_0' to tip task_20
>>> 1312311107_0004_m_000001, for tracker 'tracker_slave2:localhost/
>>> 127.0.0.1:52677'
>>> 2013-12-31 14:27:40,908 INFO org.apache.hadoop.mapred.JobInProgress:
>>> Choosing data-local task task_201312311107_0004_m_000001
>>> 2013-12-31 14:27:41,122 INFO org.apache.hadoop.mapred.JobTracker: Adding
>>> task (MAP) 'attempt_201312311107_0004_m_000002_0' to tip task_20
>>> 1312311107_0004_m_000002, for tracker 'tracker_slave3:localhost/
>>> 127.0.0.1:46845'
>>> 2013-12-31 14:27:41,123 INFO org.apache.hadoop.mapred.JobInProgress:
>>> Choosing data-local task task_201312311107_0004_m_000002
>>> 2013-12-31 14:27:49,659 INFO org.apache.hadoop.mapred.JobInProgress:
>>> Task 'attempt_201312311107_0004_m_000002_0' has completed task_20131
>>> 2311107_0004_m_000002 successfully.
>>> 2013-12-31 14:27:49,662 INFO org.apache.hadoop.mapred.JobTracker: Adding
>>> task (REDUCE) 'attempt_201312311107_0004_r_000000_0' to tip task
>>> _201312311107_0004_r_000000, for tracker 'tracker_slave3:localhost/
>>> 127.0.0.1:46845'
>>> 2013-12-31 14:27:50,338 INFO org.apache.hadoop.mapred.JobInProgress:
>>> Task 'attempt_201312311107_0004_m_000000_0' has completed task_20131
>>> 2311107_0004_m_000000 successfully.
>>> 2013-12-31 14:27:51,168 INFO org.apache.hadoop.mapred.JobInProgress:
>>> Task 'attempt_201312311107_0004_m_000001_0' has completed task_20131
>>> 2311107_0004_m_000001 successfully.
>>> 2013-12-31 14:27:54,207 INFO org.apache.hadoop.mapred.TaskInProgress:
>>> Error from attempt_201312311107_0003_r_000000_0: Shuffle Error: Exc
>>> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
>>> 2013-12-31 14:27:54,208 INFO org.apache.hadoop.mapred.JobTracker:
>>> Removing task 'attempt_201312311107_0003_r_000000_0'
>>> 2013-12-31 14:27:54,209 INFO org.apache.hadoop.mapred.JobTracker: Adding
>>> task (TASK_CLEANUP) 'attempt_201312311107_0003_r_000000_0' to ti
>>> p task_201312311107_0003_r_000000, for tracker 'tracker_slave2:localhost/
>>> 127.0.0.1:52677'
>>> 2013-12-31 14:27:58,797 INFO org.apache.hadoop.mapred.JobTracker:
>>> Removing task 'attempt_201312311107_0003_r_000000_0'
>>> 2013-12-31 14:27:58,815 INFO org.apache.hadoop.mapred.JobTracker: Adding
>>> task (REDUCE) 'attempt_201312311107_0003_r_000000_1' to tip task
>>> _201312311107_0003_r_000000, for tracker 'tracker_slave1:localhost/
>>> 127.0.0.1:57492'
>>> hduser@pc228:/usr/local/hadoop/logs$
>>>
>>>
>>> I am referring the below document to configure hadoop cluster.
>>>
>>>
>>> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
>>>
>>> Did i miss something ? Pls guide.
>>>
>>> Thanks
>>> Navaz
>>>
>>>
>>> On Tue, Dec 31, 2013 at 3:25 PM, Hardik Pandya <smarty.juice@gmail.com>wrote:
>>>
>>>> what does your job log says? is yout hdfs-site configured properly to
>>>> find 3 data nodes? this could very well getting stuck in shuffle phase
>>>>
>>>> last thing to try : does stop-all and start-all helps? even worse try
>>>> formatting namenode
>>>>
>>>>
>>>> On Tue, Dec 31, 2013 at 11:40 AM, navaz <navaz.enc@gmail.com> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>>
>>>>> I am running Hadoop cluster with 1 name node and 3 data nodes.
>>>>>
>>>>> My HDFS looks like this.
>>>>>
>>>>> hduser@nm:/usr/local/hadoop$ hadoop fs -ls
>>>>> /user/hduser/getty/gutenberg
>>>>> Warning: $HADOOP_HOME is deprecated.
>>>>>
>>>>> Found 7 items
>>>>> -rw-r--r--   4 hduser supergroup     343691 2013-12-30 19:12
>>>>> /user/hduser/getty/gutenberg/pg132.txt
>>>>> -rw-r--r--   4 hduser supergroup     594933 2013-12-30 19:12
>>>>> /user/hduser/getty/gutenberg/pg1661.txt
>>>>> -rw-r--r--   4 hduser supergroup    1945886 2013-12-30 19:12
>>>>> /user/hduser/getty/gutenberg/pg19699.txt
>>>>> -rw-r--r--   4 hduser supergroup     674570 2013-12-30 19:12
>>>>> /user/hduser/getty/gutenberg/pg20417.txt
>>>>> -rw-r--r--   4 hduser supergroup    1573150 2013-12-30 19:12
>>>>> /user/hduser/getty/gutenberg/pg4300.txt
>>>>> -rw-r--r--   4 hduser supergroup    1423803 2013-12-30 19:12
>>>>> /user/hduser/getty/gutenberg/pg5000.txt
>>>>> -rw-r--r--   4 hduser supergroup     393968 2013-12-30 19:12
>>>>> /user/hduser/getty/gutenberg/pg972.txt
>>>>> hduser@nm:/usr/local/hadoop$
>>>>>
>>>>> When i start mapreduce wordcount program it gives 100% mapping and
>>>>> reduce is hangs at 14%.
>>>>>
>>>>> hduser@nm:~$ hadoop jar chiu-wordcount2.jar WordCount
>>>>> /user/hduser/getty/gutenberg /user/hduser/getty/gutenberg_out3
>>>>> Warning: $HADOOP_HOME is deprecated.
>>>>>
>>>>> 13/12/31 09:31:07 WARN mapred.JobClient: Use GenericOptionsParser for
>>>>> parsing the arguments. Applications should implement Tool for the same.
>>>>> 13/12/31 09:31:07 INFO input.FileInputFormat: Total input paths to
>>>>> process : 7
>>>>> 13/12/31 09:31:08 INFO util.NativeCodeLoader: Loaded the native-hadoop
>>>>> library
>>>>> 13/12/31 09:31:08 WARN snappy.LoadSnappy: Snappy native library not
>>>>> loaded
>>>>> 13/12/31 09:31:08 INFO mapred.JobClient: Running job:
>>>>> job_201312310929_0001
>>>>> 13/12/31 09:31:09 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 13/12/31 09:31:29 INFO mapred.JobClient:  map 14% reduce 0%
>>>>> 13/12/31 09:31:34 INFO mapred.JobClient:  map 32% reduce 0%
>>>>> 13/12/31 09:31:35 INFO mapred.JobClient:  map 75% reduce 0%
>>>>> 13/12/31 09:31:36 INFO mapred.JobClient:  map 90% reduce 0%
>>>>> 13/12/31 09:31:37 INFO mapred.JobClient:  map 99% reduce 0%
>>>>> 13/12/31 09:31:38 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 13/12/31 09:31:43 INFO mapred.JobClient:  map 100% reduce 14%
>>>>>
>>>>> <HANGS HEAR>
>>>>>
>>>>> Could you please help me in resolving this issue.
>>>>>
>>>>>
>>>>> Thanks & Regards
>>>>> *Abdul Navaz*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Abdul Navaz*
>>> *Masters in Network Communications*
>>> *University of Houston*
>>> *Houston, TX - 77204-4020*
>>> *Ph - 281-685-0388 <281-685-0388>*
>>> *fabdulnavaz@uh.edu* <fabdulnavaz@uh.edu>
>>>
>>>
>>
>
>
> --
> *Abdul Navaz*
> *Masters in Network Communications*
> *University of Houston*
> *Houston, TX - 77204-4020*
> *Ph - 281-685-0388 <281-685-0388>*
> *fabdulnavaz@uh.edu* <fabdulnavaz@uh.edu>
>
>

Mime
View raw message