hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hardik Pandya <smarty.ju...@gmail.com>
Subject Re: Map succeeds but reduce hangs
Date Tue, 31 Dec 2013 21:56:19 GMT
as expected, its failing during shuffle

it seems like hdfs could not resolve the DNS name for slave nodes

have your configured your slaves host names correctly?

2013-12-31 14:27:54,207 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201312311107_0003_r_000000_0: Shuffle Error: Exc
eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
2013-12-31 14:27:54,208 INFO org.apache.hadoop.mapred.JobTracker: Removing
task 'attempt_201312311107_0003_r_000000_0'
2013-12-31 14:27:54,209 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (TASK_CLEANUP) 'attempt_201312311107_0003_r_000000_0' to ti
p task_201312311107_0003_r_000000, for tracker 'tracker_slave2:localhost/
127.0.0.1:52677'
2013-12-31 14:27:58,797 INFO org.apache.hadoop.mapred.JobTracker: Removing
task 'attempt_201312311107_0003_r_000000_0'
2013-12-31 14:27:58,815 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (REDUCE) 'attempt_201312311107_0003_r_000000_1' to tip task
_201312311107_0003_r_000000, for tracker 'tracker_slave1:localhost/
127.0.0.1:57492'




On Tue, Dec 31, 2013 at 4:42 PM, navaz <navaz.enc@gmail.com> wrote:

> Hi
>
> My hdfs-site is configured for 4 nodes. ( One is master and 3 slaves)
>
> <property>
>  <name>dfs.replication</name>
>  <value>4</value>
>
> start-dfs.sh and stop-mapred.sh doesnt solve the problem.
>
> Also tried to run the program after formatting the namenode(Master) which
> also fails.
>
> My jobtracker logs on the master ( name node) is give below.
>
>
>
> 2013-12-31 14:27:35,534 INFO org.apache.hadoop.mapred.JobInProgress:
> job_201312311107_0004: nMaps=3 nReduces=1 max=-1
> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.JobTracker: Job
> job_201312311107_0004 added successfully for user 'hduser' to queue
>  'default'
> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.AuditLogger:
> USER=hduser  IP=155.98.39.28 OPERATION=SUBMIT_JOB    TARGET=job_201312
> 311107_0004     RESULT=SUCCESS
> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.JobTracker:
> Initializing job_201312311107_0004
> 2013-12-31 14:27:35,595 INFO org.apache.hadoop.mapred.JobInProgress:
> Initializing job_201312311107_0004
> 2013-12-31 14:27:35,785 INFO org.apache.hadoop.mapred.JobInProgress:
> jobToken generated and stored with users keys in /app/hadoop/tmp/map
> red/system/job_201312311107_0004/jobToken
> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress: Input
> size for job job_201312311107_0004 = 3671523. Number of splits
>  = 3
> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/
> master
> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/
> slave2
> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/
> slave1
> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/
> slave3
> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/
> master
> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/
> slave1
> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/
> slave3
> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/
> slave2
> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/
> master
> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/
> slave1
> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/
> slave2
> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/
> slave3
> 2013-12-31 14:27:35,798 INFO org.apache.hadoop.mapred.JobInProgress:
> job_201312311107_0004 LOCALITY_WAIT_FACTOR=1.0
> 2013-12-31 14:27:35,798 INFO org.apache.hadoop.mapred.JobInProgress: Job
> job_201312311107_0004 initialized successfully with 3 map tasks
> and 1 reduce tasks.
> 2013-12-31 14:27:35,913 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (JOB_SETUP) 'attempt_201312311107_0004_m_000004_0' to tip t
> ask_201312311107_0004_m_000004, for tracker 'tracker_slave1:localhost/
> 127.0.0.1:57492'
> 2013-12-31 14:27:40,876 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201312311107_0004_m_000004_0' has completed task_20131
> 2311107_0004_m_000004 successfully.
> 2013-12-31 14:27:40,878 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (MAP) 'attempt_201312311107_0004_m_000000_0' to tip task_20
> 1312311107_0004_m_000000, for tracker 'tracker_slave1:localhost/
> 127.0.0.1:57492'
> 2013-12-31 14:27:40,878 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_201312311107_0004_m_000000
> 2013-12-31 14:27:40,907 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (MAP) 'attempt_201312311107_0004_m_000001_0' to tip task_20
> 1312311107_0004_m_000001, for tracker 'tracker_slave2:localhost/
> 127.0.0.1:52677'
> 2013-12-31 14:27:40,908 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_201312311107_0004_m_000001
> 2013-12-31 14:27:41,122 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (MAP) 'attempt_201312311107_0004_m_000002_0' to tip task_20
> 1312311107_0004_m_000002, for tracker 'tracker_slave3:localhost/
> 127.0.0.1:46845'
> 2013-12-31 14:27:41,123 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_201312311107_0004_m_000002
> 2013-12-31 14:27:49,659 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201312311107_0004_m_000002_0' has completed task_20131
> 2311107_0004_m_000002 successfully.
> 2013-12-31 14:27:49,662 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (REDUCE) 'attempt_201312311107_0004_r_000000_0' to tip task
> _201312311107_0004_r_000000, for tracker 'tracker_slave3:localhost/
> 127.0.0.1:46845'
> 2013-12-31 14:27:50,338 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201312311107_0004_m_000000_0' has completed task_20131
> 2311107_0004_m_000000 successfully.
> 2013-12-31 14:27:51,168 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201312311107_0004_m_000001_0' has completed task_20131
> 2311107_0004_m_000001 successfully.
> 2013-12-31 14:27:54,207 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201312311107_0003_r_000000_0: Shuffle Error: Exc
> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 2013-12-31 14:27:54,208 INFO org.apache.hadoop.mapred.JobTracker: Removing
> task 'attempt_201312311107_0003_r_000000_0'
> 2013-12-31 14:27:54,209 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (TASK_CLEANUP) 'attempt_201312311107_0003_r_000000_0' to ti
> p task_201312311107_0003_r_000000, for tracker 'tracker_slave2:localhost/
> 127.0.0.1:52677'
> 2013-12-31 14:27:58,797 INFO org.apache.hadoop.mapred.JobTracker: Removing
> task 'attempt_201312311107_0003_r_000000_0'
> 2013-12-31 14:27:58,815 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (REDUCE) 'attempt_201312311107_0003_r_000000_1' to tip task
> _201312311107_0003_r_000000, for tracker 'tracker_slave1:localhost/
> 127.0.0.1:57492'
> hduser@pc228:/usr/local/hadoop/logs$
>
>
> I am referring the below document to configure hadoop cluster.
>
>
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
>
> Did i miss something ? Pls guide.
>
> Thanks
> Navaz
>
>
> On Tue, Dec 31, 2013 at 3:25 PM, Hardik Pandya <smarty.juice@gmail.com>wrote:
>
>> what does your job log says? is yout hdfs-site configured properly to
>> find 3 data nodes? this could very well getting stuck in shuffle phase
>>
>> last thing to try : does stop-all and start-all helps? even worse try
>> formatting namenode
>>
>>
>> On Tue, Dec 31, 2013 at 11:40 AM, navaz <navaz.enc@gmail.com> wrote:
>>
>>> Hi
>>>
>>>
>>> I am running Hadoop cluster with 1 name node and 3 data nodes.
>>>
>>> My HDFS looks like this.
>>>
>>> hduser@nm:/usr/local/hadoop$ hadoop fs -ls /user/hduser/getty/gutenberg
>>> Warning: $HADOOP_HOME is deprecated.
>>>
>>> Found 7 items
>>> -rw-r--r--   4 hduser supergroup     343691 2013-12-30 19:12
>>> /user/hduser/getty/gutenberg/pg132.txt
>>> -rw-r--r--   4 hduser supergroup     594933 2013-12-30 19:12
>>> /user/hduser/getty/gutenberg/pg1661.txt
>>> -rw-r--r--   4 hduser supergroup    1945886 2013-12-30 19:12
>>> /user/hduser/getty/gutenberg/pg19699.txt
>>> -rw-r--r--   4 hduser supergroup     674570 2013-12-30 19:12
>>> /user/hduser/getty/gutenberg/pg20417.txt
>>> -rw-r--r--   4 hduser supergroup    1573150 2013-12-30 19:12
>>> /user/hduser/getty/gutenberg/pg4300.txt
>>> -rw-r--r--   4 hduser supergroup    1423803 2013-12-30 19:12
>>> /user/hduser/getty/gutenberg/pg5000.txt
>>> -rw-r--r--   4 hduser supergroup     393968 2013-12-30 19:12
>>> /user/hduser/getty/gutenberg/pg972.txt
>>> hduser@nm:/usr/local/hadoop$
>>>
>>> When i start mapreduce wordcount program it gives 100% mapping and
>>> reduce is hangs at 14%.
>>>
>>> hduser@nm:~$ hadoop jar chiu-wordcount2.jar WordCount
>>> /user/hduser/getty/gutenberg /user/hduser/getty/gutenberg_out3
>>> Warning: $HADOOP_HOME is deprecated.
>>>
>>> 13/12/31 09:31:07 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 13/12/31 09:31:07 INFO input.FileInputFormat: Total input paths to
>>> process : 7
>>> 13/12/31 09:31:08 INFO util.NativeCodeLoader: Loaded the native-hadoop
>>> library
>>> 13/12/31 09:31:08 WARN snappy.LoadSnappy: Snappy native library not
>>> loaded
>>> 13/12/31 09:31:08 INFO mapred.JobClient: Running job:
>>> job_201312310929_0001
>>> 13/12/31 09:31:09 INFO mapred.JobClient:  map 0% reduce 0%
>>> 13/12/31 09:31:29 INFO mapred.JobClient:  map 14% reduce 0%
>>> 13/12/31 09:31:34 INFO mapred.JobClient:  map 32% reduce 0%
>>> 13/12/31 09:31:35 INFO mapred.JobClient:  map 75% reduce 0%
>>> 13/12/31 09:31:36 INFO mapred.JobClient:  map 90% reduce 0%
>>> 13/12/31 09:31:37 INFO mapred.JobClient:  map 99% reduce 0%
>>> 13/12/31 09:31:38 INFO mapred.JobClient:  map 100% reduce 0%
>>> 13/12/31 09:31:43 INFO mapred.JobClient:  map 100% reduce 14%
>>>
>>> <HANGS HEAR>
>>>
>>> Could you please help me in resolving this issue.
>>>
>>>
>>> Thanks & Regards
>>> *Abdul Navaz*
>>>
>>>
>>>
>>>
>>
>
>
> --
> *Abdul Navaz*
> *Masters in Network Communications*
> *University of Houston*
> *Houston, TX - 77204-4020*
> *Ph - 281-685-0388 <281-685-0388>*
> *fabdulnavaz@uh.edu* <fabdulnavaz@uh.edu>
>
>

Mime
View raw message