hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: Map succeeds but reduce hangs
Date Thu, 02 Jan 2014 18:28:53 GMT
Check the TaskTracker configuration in mapred-site.xml: mapred.task.tracker.report.address.
You may be setting it to 127.0.0.1:0 or localhost:0. Change it to 0.0.0.0:0 and restart the
daemons.

Thanks,
+Vinod

On Jan 1, 2014, at 2:14 PM, navaz <navaz.enc@gmail.com> wrote:

> I dont know y it is running on localhost. I have commented it.
> ==================================================================
> slave1:
> Hostname: pc321
> 
> hduser@pc321:/etc$ vi hosts
> #127.0.0.1      localhost loghost localhost.myslice.ch-geni-net.emulab.net
> 155.98.39.28    pc228
> 155.98.39.121   pc321
> 155.98.39.27    dn3.myslice.ch-geni-net.emulab.net
> ========================================================================
> slave2:
> hostname: dn3.myslice.ch-geni-net.emulab.net
> hduser@dn3:/etc$ vi hosts
> #127.0.0.1      localhost loghost localhost.myslice.ch-geni-net.emulab.net
> 155.98.39.28    pc228
> 155.98.39.121   pc321
> 155.98.39.27    dn3.myslice.ch-geni-net.emulab.net
> ========================================================================
> Master:
> Hostame: pc228
> hduser@pc228:/etc$ vi hosts
> #127.0.0.1      localhost loghost localhost.myslice.ch-geni-net.emulab.net
> 155.98.39.28   pc228
> 155.98.39.121  pc321
> #155.98.39.19   slave2
> 155.98.39.27   dn3.myslice.ch-geni-net.emulab.net
> ============================================================================
> I have replaced localhost with pc228 in coresite.xml and mapreduce-site.xml and replication
factor as 3.
> 
> I can able to ssh pc321 and dn3.myslice.ch-geni-net.emulab.net from master.
> 
> 
> hduser@pc228:/usr/local/hadoop/conf$ more slaves
> pc228
> pc321
> dn3.myslice.ch-geni-net.emulab.net
> 
> hduser@pc228:/usr/local/hadoop/conf$ more masters
> pc228
> hduser@pc228:/usr/local/hadoop/conf$
> 
> 
> 
> Am i am doing anything wrong here ?
> 
> 
> On Wed, Jan 1, 2014 at 4:54 PM, Hardik Pandya <smarty.juice@gmail.com> wrote:
> do you have your hosnames properly configured in etc/hosts? have you tried 192.168.?.?
instead of localhost 127.0.0.1
> 
> 
> 
> On Wed, Jan 1, 2014 at 11:33 AM, navaz <navaz.enc@gmail.com> wrote:
> Thanks. But I wonder Why map succeeds 100% , How it resolve hostname ?
> 
> Now reduce becomes 100% but bailing out slave2 and slave 3 . ( But Mappig is succeded
for these nodes).
> 
> Does it looks for hostname only for reduce ?
> 
> 
> 14/01/01 09:09:38 INFO mapred.JobClient: Running job: job_201401010908_0001
> 14/01/01 09:09:39 INFO mapred.JobClient:  map 0% reduce 0%
> 14/01/01 09:10:00 INFO mapred.JobClient:  map 33% reduce 0%
> 14/01/01 09:10:01 INFO mapred.JobClient:  map 66% reduce 0%
> 14/01/01 09:10:05 INFO mapred.JobClient:  map 100% reduce 0%
> 14/01/01 09:10:14 INFO mapred.JobClient:  map 100% reduce 22%
> 14/01/01 09:17:32 INFO mapred.JobClient:  map 100% reduce 0%
> 14/01/01 09:17:35 INFO mapred.JobClient: Task Id : attempt_201401010908_0001_r_000000_0,
Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 14/01/01 09:17:46 INFO mapred.JobClient:  map 100% reduce 11%
> 14/01/01 09:17:50 INFO mapred.JobClient:  map 100% reduce 22%
> 14/01/01 09:25:06 INFO mapred.JobClient:  map 100% reduce 0%
> 14/01/01 09:25:10 INFO mapred.JobClient: Task Id : attempt_201401010908_0001_r_000000_1,
Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 14/01/01 09:25:34 INFO mapred.JobClient:  map 100% reduce 100%
> 14/01/01 09:25:42 INFO mapred.JobClient: Job complete: job_201401010908_0001
> 14/01/01 09:25:42 INFO mapred.JobClient: Counters: 29
> 
> 
> 
> Job Tracker logs:
> 2014-01-01 09:09:59,874 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201401010908_0001_m_000002_0'
has completed task_20140
> 1010908_0001_m_000002 successfully.
> 2014-01-01 09:10:04,231 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201401010908_0001_m_000001_0'
has completed task_20140
> 1010908_0001_m_000001 successfully.
> 2014-01-01 09:17:30,527 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201401010908_0001_r_000000_0:
Shuffle Error: Exc
> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 2014-01-01 09:17:30,528 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201401010908_0001_r_000000_0'
> 2014-01-01 09:17:30,529 INFO org.apache.hadoop.mapred.JobTracker: Adding task (TASK_CLEANUP)
'attempt_201401010908_0001_r_000000_0' to ti
> p task_201401010908_0001_r_000000, for tracker 'tracker_slave3:localhost/127.0.0.1:44663'
> 2014-01-01 09:17:35,130 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201401010908_0001_r_000000_0'
> 2014-01-01 09:17:35,213 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE)
'attempt_201401010908_0001_r_000000_1' to tip task
> _201401010908_0001_r_000000, for tracker 'tracker_slave2:localhost/127.0.0.1:51438'
> 2014-01-01 09:25:05,493 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201401010908_0001_r_000000_1:
Shuffle Error: Exc
> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 2014-01-01 09:25:05,493 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201401010908_0001_r_000000_1'
> 2014-01-01 09:25:05,494 INFO org.apache.hadoop.mapred.JobTracker: Adding task (TASK_CLEANUP)
'attempt_201401010908_0001_r_000000_1' to ti
> p task_201401010908_0001_r_000000, for tracker 'tracker_slave2:localhost/127.0.0.1:51438'
> 2014-01-01 09:25:10,087 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201401010908_0001_r_000000_1'
> 2014-01-01 09:25:10,109 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE)
'attempt_201401010908_0001_r_000000_2' to tip task
> _201401010908_0001_r_000000, for tracker 'tracker_master:localhost/127.0.0.1:57156'
> 2014-01-01 09:25:33,340 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201401010908_0001_r_000000_2'
has completed task_20140
> 1010908_0001_r_000000 successfully.
> 2014-01-01 09:25:33,462 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_CLEANUP)
'attempt_201401010908_0001_m_000003_0' to tip
>  task_201401010908_0001_m_000003, for tracker 'tracker_master:localhost/127.0.0.1:57156'
> 2014-01-01 09:25:42,304 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201401010908_0001_m_000003_0'
has completed task_20140
> 1010908_0001_m_000003 successfully.
> 
> 
> On Tue, Dec 31, 2013 at 4:56 PM, Hardik Pandya <smarty.juice@gmail.com> wrote:
> as expected, its failing during shuffle
> 
> it seems like hdfs could not resolve the DNS name for slave nodes
> 
> have your configured your slaves host names correctly?
> 
> 2013-12-31 14:27:54,207 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201312311107_0003_r_000000_0:
Shuffle Error: Exc
> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 2013-12-31 14:27:54,208 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201312311107_0003_r_000000_0'
> 2013-12-31 14:27:54,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task (TASK_CLEANUP)
'attempt_201312311107_0003_r_000000_0' to ti
> p task_201312311107_0003_r_000000, for tracker 'tracker_slave2:localhost/127.0.0.1:52677'
> 2013-12-31 14:27:58,797 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201312311107_0003_r_000000_0'
> 2013-12-31 14:27:58,815 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE)
'attempt_201312311107_0003_r_000000_1' to tip task
> _201312311107_0003_r_000000, for tracker 'tracker_slave1:localhost/127.0.0.1:57492'
> 
> 
> 
> 
> On Tue, Dec 31, 2013 at 4:42 PM, navaz <navaz.enc@gmail.com> wrote:
> Hi
> 
> My hdfs-site is configured for 4 nodes. ( One is master and 3 slaves)
> 
> <property>
>  <name>dfs.replication</name>
>  <value>4</value>
> 
> start-dfs.sh and stop-mapred.sh doesnt solve the problem.
> 
> Also tried to run the program after formatting the namenode(Master) which also fails.
> 
> My jobtracker logs on the master ( name node) is give below.
> 
> 
> 
> 2013-12-31 14:27:35,534 INFO org.apache.hadoop.mapred.JobInProgress: job_201312311107_0004:
nMaps=3 nReduces=1 max=-1
> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.JobTracker: Job job_201312311107_0004
added successfully for user 'hduser' to queue
>  'default'
> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.AuditLogger: USER=hduser  IP=155.98.39.28
OPERATION=SUBMIT_JOB    TARGET=job_201312
> 311107_0004     RESULT=SUCCESS
> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201312311107_0004
> 2013-12-31 14:27:35,595 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201312311107_0004
> 2013-12-31 14:27:35,785 INFO org.apache.hadoop.mapred.JobInProgress: jobToken generated
and stored with users keys in /app/hadoop/tmp/map
> red/system/job_201312311107_0004/jobToken
> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job
job_201312311107_0004 = 3671523. Number of splits
>  = 3
> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201312311107_0004_m_000000
has split on node:/default-rack/
> master
> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201312311107_0004_m_000000
has split on node:/default-rack/
> slave2
> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201312311107_0004_m_000000
has split on node:/default-rack/
> slave1
> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201312311107_0004_m_000000
has split on node:/default-rack/
> slave3
> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201312311107_0004_m_000001
has split on node:/default-rack/
> master
> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201312311107_0004_m_000001
has split on node:/default-rack/
> slave1
> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201312311107_0004_m_000001
has split on node:/default-rack/
> slave3
> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201312311107_0004_m_000001
has split on node:/default-rack/
> slave2
> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201312311107_0004_m_000002
has split on node:/default-rack/
> master
> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201312311107_0004_m_000002
has split on node:/default-rack/
> slave1
> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201312311107_0004_m_000002
has split on node:/default-rack/
> slave2
> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201312311107_0004_m_000002
has split on node:/default-rack/
> slave3
> 2013-12-31 14:27:35,798 INFO org.apache.hadoop.mapred.JobInProgress: job_201312311107_0004
LOCALITY_WAIT_FACTOR=1.0
> 2013-12-31 14:27:35,798 INFO org.apache.hadoop.mapred.JobInProgress: Job job_201312311107_0004
initialized successfully with 3 map tasks
> and 1 reduce tasks.
> 2013-12-31 14:27:35,913 INFO org.apache.hadoop.mapred.JobTracker: Adding task (JOB_SETUP)
'attempt_201312311107_0004_m_000004_0' to tip t
> ask_201312311107_0004_m_000004, for tracker 'tracker_slave1:localhost/127.0.0.1:57492'
> 2013-12-31 14:27:40,876 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201312311107_0004_m_000004_0'
has completed task_20131
> 2311107_0004_m_000004 successfully.
> 2013-12-31 14:27:40,878 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201312311107_0004_m_000000_0'
to tip task_20
> 1312311107_0004_m_000000, for tracker 'tracker_slave1:localhost/127.0.0.1:57492'
> 2013-12-31 14:27:40,878 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local
task task_201312311107_0004_m_000000
> 2013-12-31 14:27:40,907 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201312311107_0004_m_000001_0'
to tip task_20
> 1312311107_0004_m_000001, for tracker 'tracker_slave2:localhost/127.0.0.1:52677'
> 2013-12-31 14:27:40,908 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local
task task_201312311107_0004_m_000001
> 2013-12-31 14:27:41,122 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201312311107_0004_m_000002_0'
to tip task_20
> 1312311107_0004_m_000002, for tracker 'tracker_slave3:localhost/127.0.0.1:46845'
> 2013-12-31 14:27:41,123 INFO org.apache.hadoop.mapred.JobInProgress: Choosing data-local
task task_201312311107_0004_m_000002
> 2013-12-31 14:27:49,659 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201312311107_0004_m_000002_0'
has completed task_20131
> 2311107_0004_m_000002 successfully.
> 2013-12-31 14:27:49,662 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE)
'attempt_201312311107_0004_r_000000_0' to tip task
> _201312311107_0004_r_000000, for tracker 'tracker_slave3:localhost/127.0.0.1:46845'
> 2013-12-31 14:27:50,338 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201312311107_0004_m_000000_0'
has completed task_20131
> 2311107_0004_m_000000 successfully.
> 2013-12-31 14:27:51,168 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201312311107_0004_m_000001_0'
has completed task_20131
> 2311107_0004_m_000001 successfully.
> 2013-12-31 14:27:54,207 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201312311107_0003_r_000000_0:
Shuffle Error: Exc
> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 2013-12-31 14:27:54,208 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201312311107_0003_r_000000_0'
> 2013-12-31 14:27:54,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task (TASK_CLEANUP)
'attempt_201312311107_0003_r_000000_0' to ti
> p task_201312311107_0003_r_000000, for tracker 'tracker_slave2:localhost/127.0.0.1:52677'
> 2013-12-31 14:27:58,797 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201312311107_0003_r_000000_0'
> 2013-12-31 14:27:58,815 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE)
'attempt_201312311107_0003_r_000000_1' to tip task
> _201312311107_0003_r_000000, for tracker 'tracker_slave1:localhost/127.0.0.1:57492'
> hduser@pc228:/usr/local/hadoop/logs$
> 
> 
> I am referring the below document to configure hadoop cluster.
> 
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
> 
> Did i miss something ? Pls guide.
> 
> Thanks
> Navaz
> 
> 
> On Tue, Dec 31, 2013 at 3:25 PM, Hardik Pandya <smarty.juice@gmail.com> wrote:
> what does your job log says? is yout hdfs-site configured properly to find 3 data nodes?
this could very well getting stuck in shuffle phase
> 
> last thing to try : does stop-all and start-all helps? even worse try formatting namenode
> 
> 
> On Tue, Dec 31, 2013 at 11:40 AM, navaz <navaz.enc@gmail.com> wrote:
> Hi
> 
> 
> I am running Hadoop cluster with 1 name node and 3 data nodes. 
> 
> My HDFS looks like this.
> 
> hduser@nm:/usr/local/hadoop$ hadoop fs -ls /user/hduser/getty/gutenberg
> Warning: $HADOOP_HOME is deprecated.
> 
> Found 7 items
> -rw-r--r--   4 hduser supergroup     343691 2013-12-30 19:12 /user/hduser/getty/gutenberg/pg132.txt
> -rw-r--r--   4 hduser supergroup     594933 2013-12-30 19:12 /user/hduser/getty/gutenberg/pg1661.txt
> -rw-r--r--   4 hduser supergroup    1945886 2013-12-30 19:12 /user/hduser/getty/gutenberg/pg19699.txt
> -rw-r--r--   4 hduser supergroup     674570 2013-12-30 19:12 /user/hduser/getty/gutenberg/pg20417.txt
> -rw-r--r--   4 hduser supergroup    1573150 2013-12-30 19:12 /user/hduser/getty/gutenberg/pg4300.txt
> -rw-r--r--   4 hduser supergroup    1423803 2013-12-30 19:12 /user/hduser/getty/gutenberg/pg5000.txt
> -rw-r--r--   4 hduser supergroup     393968 2013-12-30 19:12 /user/hduser/getty/gutenberg/pg972.txt
> hduser@nm:/usr/local/hadoop$
> 
> When i start mapreduce wordcount program it gives 100% mapping and reduce is hangs at
14%.
> 
> hduser@nm:~$ hadoop jar chiu-wordcount2.jar WordCount /user/hduser/getty/gutenberg /user/hduser/getty/gutenberg_out3
> Warning: $HADOOP_HOME is deprecated.
> 
> 13/12/31 09:31:07 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
Applications should implement Tool for the same.
> 13/12/31 09:31:07 INFO input.FileInputFormat: Total input paths to process : 7
> 13/12/31 09:31:08 INFO util.NativeCodeLoader: Loaded the native-hadoop library
> 13/12/31 09:31:08 WARN snappy.LoadSnappy: Snappy native library not loaded
> 13/12/31 09:31:08 INFO mapred.JobClient: Running job: job_201312310929_0001
> 13/12/31 09:31:09 INFO mapred.JobClient:  map 0% reduce 0%
> 13/12/31 09:31:29 INFO mapred.JobClient:  map 14% reduce 0%
> 13/12/31 09:31:34 INFO mapred.JobClient:  map 32% reduce 0%
> 13/12/31 09:31:35 INFO mapred.JobClient:  map 75% reduce 0%
> 13/12/31 09:31:36 INFO mapred.JobClient:  map 90% reduce 0%
> 13/12/31 09:31:37 INFO mapred.JobClient:  map 99% reduce 0%
> 13/12/31 09:31:38 INFO mapred.JobClient:  map 100% reduce 0%
> 13/12/31 09:31:43 INFO mapred.JobClient:  map 100% reduce 14%
> 
> <HANGS HEAR>
> 
> Could you please help me in resolving this issue.
> 
> 
> Thanks & Regards
> Abdul Navaz
> 
> 
> 
> 
> 
> 
> 
> -- 
> Abdul Navaz
> Masters in Network Communications
> University of Houston
> Houston, TX - 77204-4020
> Ph - 281-685-0388
> fabdulnavaz@uh.edu
> 
> 
> 
> 
> 
> -- 
> Abdul Navaz
> Masters in Network Communications
> University of Houston
> Houston, TX - 77204-4020
> Ph - 281-685-0388
> fabdulnavaz@uh.edu
> 
> 
> 
> 
> 
> -- 
> Abdul Navaz
> Masters in Network Communications
> University of Houston
> Houston, TX - 77204-4020
> Ph - 281-685-0388
> fabdulnavaz@uh.edu
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message