hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "shangan" <shan...@corp.kaixin001.com>
Subject Re: RE: mapreduce doesn't work in my cluster
Date Wed, 18 Aug 2010 09:58:26 GMT
I'm sorry to make some clerical mistakes in the question. 
I copied the core-site.xml configuration to hdfs-site.xml in the statement,actually I of course
rsync all slaves with master. Thank you for you help.
btw, could it be other reasons you know like two many ip-host mapping pairs in /etc/hosts
of the same host ?




2010-08-18 



shangan 



发件人: xiujin yang 
发送时间: 2010-08-18  17:35:30 
收件人: common-user@hadoop.apache.org 
抄送: 
主题: RE: mapreduce doesn't work in my cluster 
 
Hi Shangan,
I found a strange thing. 
That two hdfs-site.xml are different as you posted. 
Please check, do you rsync the slaves with master?  
hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
   <property>
        <name>fs.default.name</name>
        <value>hdfs://vm153:9000</value>
   </property>
   <property>
        <name>fs.trash.interval</name>
        <value>20</value>
   </property>
<property>
  <name>fs.checkpoint.period</name>
  <value>300</value>
  <description>The number of seconds between two periodic checkpoints.
  </description>
</property>
</configuration>
[shangan@vm153 conf]$ more hdfs-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
  <name>dfs.replication</name>
  <value>2</value>
</property>
<property>
  <name>dfs.hosts.exclude</name>
  <value>/home/shangan/bin/hadoop-0.20.2/conf/exclude</value>
</property>
</configuration>
Best,
Xiujin Yang
> From: akashakya@gmail.com
> Date: Wed, 18 Aug 2010 19:30:34 +1000
> Subject: Re: RE: mapreduce doesn't work in my cluster
> To: common-user@hadoop.apache.org
> 
> Please remove the locahost things, and you will probably be fine.
> 
> Regards
> Akash Deep Shakya "OpenAK"
> University of New South Wales
> akashakya at gmail dot com
> 
> ~ Failure to prepare is preparing to fail ~
> 
> 
> 
> 2010/8/18 shangan <shangan@corp.kaixin001.com>
> 
> > 127.0.0.1               localhost.localdomain localhost
> > ::1             localhost6.localdomain6 localhost6
> > 192.168.0.153           vm153
> > 192.168.0.148           vm148
> > 192.168.0.152           vm152
> > 192.168.0.154           vm154
> >
> > the vm153,vm148,vm152,vm154 are the nodes I'm using in the cluster, as each
> > node has some other ip-host mapping pairs for other use and I don't know
> > whether it will affect. Looking forward your further help,I really
> > appreciate it.
> >
> >
> > 2010-08-18
> >
> >
> >
> > shangan
> >
> >
> >
> > 发件人: xiujin yang
> > 发送时间: 2010-08-18  17:08:27
> > 收件人: common-user@hadoop.apache.org
> > 抄送:
> > 主题: RE: mapreduce doesn't work in my cluster
> >
> > Hi Shangan,
> > Please check your /etc/hosts, if all machines are setted.
> > Best,
> > Yang.
> > > Date: Wed, 18 Aug 2010 15:01:46 +0800
> > > From: shangan@corp.kaixin001.com
> > > To: common-user@hadoop.apache.org
> > > Subject: mapreduce doesn't work in my cluster
> > >
> > > my cluster consists of 4 nodes : 1 namenode and 3 datanodes, it works
> > well functioning as hdfs,but when I run mapreduce tasks, it will take quite
> > a long time and there're quite a lot of too many fetch-failures. I've
> > checked the log on the datanode and copy part of them as follows:
> > >
> > >
> > >
> > > 2010-08-18 14:28:33,142 WARN org.apache.hadoop.mapred.TaskTracker:
> > Unknown child with bad map output: attempt_201008171837_0007_m_000006_1.
> > Ignored.
> > > 2010-08-18 14:28:33,143 INFO
> > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.0.1:50060,
> > dest: 127.0.0.1:54245, bytes: 0, op: MAPRED_SHUFFLE, cliID:
> > attempt_201008171837_0007_m_000006_1
> > > 2010-08-18 14:28:33,143 WARN org.mortbay.log: /mapOutput:
> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > taskTracker/jobcache/job_201008171837_0007/attempt_201008171837_0007_m_000006_1/output/file.out.index
> > in any of the configured local directories
> > > 2010-08-18 14:28:34,766 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at
> > 0.00 MB/s) >
> > > 2010-08-18 14:28:37,675 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at
> > 0.00 MB/s) >
> > > 2010-08-18 14:28:40,775 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at
> > 0.00 MB/s) >
> > > 2010-08-18 14:28:43,683 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at
> > 0.00 MB/s) >
> > > 2010-08-18 14:28:43,779 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at
> > 0.00 MB/s) >
> > > 2010-08-18 14:28:46,687 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at
> > 0.00 MB/s) >
> > > 2010-08-18 14:28:49,787 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at
> > 0.00 MB/s) >
> > > 2010-08-18 14:28:52,696 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at
> > 0.00 MB/s) >
> > > 2010-08-18 14:28:55,796 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at
> > 0.00 MB/s) >
> > > 2010-08-18 14:28:58,704 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at
> > 0.00 MB/s) >
> > > 2010-08-18 14:28:58,800 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at
> > 0.00 MB/s) >
> > > 2010-08-18 14:29:01,710 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201008171837_0007_r_000003_1 0.31666666% reduce > copy (19 of 20 at
> > 0.00 MB/s) >
> > > 2010-08-18 14:29:04,808 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_201008171837_0007_r_000000_1 0.31666666% reduce > copy (19 of 20 at
> > 0.00 MB/s) >
> > > 2010-08-18 14:29:05,225 WARN org.apache.hadoop.mapred.TaskTracker:
> > getMapOutput(attempt_201008171837_0007_m_000006_1,0) failed :
> > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > taskTracker/jobcache/job_201008171837_0007/attempt_201008171837_0007_m_000006_1/output/file.out.index
> > in any of the configured local directories
> > >         at
> > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
> > >         at
> > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
> > >         at
> > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2887)
> > >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> > >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> > >         at
> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
> > >         at
> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
> > >         at
> > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > >         at
> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> > >         at
> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > >         at
> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> > >         at
> > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > >         at
> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > >         at org.mortbay.jetty.Server.handle(Server.java:324)
> > >         at
> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> > >         at
> > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> > >         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> > >         at
> > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> > >         at
> > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> > >         at
> > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> > >         at
> > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> > > 2010-08-18 14:29:05,225 WARN org.apache.hadoop.mapred.TaskTracker:
> > Unknown child with bad map output: attempt_201008171837_0007_m_000006_1.
> > Ignored.
> > > 2010-08-18 14:29:05,259 INFO
> > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.0.1:50060,
> > dest: 127.0.0.1:54288, bytes: 0, op: MAPRED_SHUFFLE, cliID:
> > attempt_201008171837_0007_m_000006_1
> > > 2010-08-18 14:29:05,259 WARN org.mortbay.log: /mapOutput:
> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > taskTracker/jobcache/job_201008171837_0007/attempt_201008171837_0007_m_000006_1/output/file.out.index
> > in any of the configured local directories
> > >
> > >
> > >
> > > Almost all datanode behave the same way, seems reduce can't get the map
> > result from other datanode and I also looked at the charts from job
> > Administrator, the copy process did last quite a long time. Can anybody give
> > me some explanation,and the following of my configuration of hadoop-0.20.2:
> > >
> > > core-site.xml
> > >
> > > <?xml version="1.0"?>
> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > <!-- Put site-specific property overrides in this file. -->
> > > <configuration>
> > >    <property>
> > >         <name>fs.default.name</name>
> > >         <value>hdfs://vm153:9000</value>
> > >    </property>
> > >    <property>
> > >         <name>fs.trash.interval</name>
> > >         <value>20</value>
> > >    </property>
> > > <property>
> > >   <name>fs.checkpoint.period</name>
> > >   <value>300</value>
> > >   <description>The number of seconds between two periodic checkpoints.
> > >   </description>
> > > </property>
> > > </configuration>
> > >
> > > hdfs-site.xml
> > >
> > > <?xml version="1.0"?>
> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > <!-- Put site-specific property overrides in this file. -->
> > > <configuration>
> > >    <property>
> > >         <name>fs.default.name</name>
> > >         <value>hdfs://vm153:9000</value>
> > >    </property>
> > >    <property>
> > >         <name>fs.trash.interval</name>
> > >         <value>20</value>
> > >    </property>
> > > <property>
> > >   <name>fs.checkpoint.period</name>
> > >   <value>300</value>
> > >   <description>The number of seconds between two periodic checkpoints.
> > >   </description>
> > > </property>
> > > </configuration>
> > > [shangan@vm153 conf]$ more hdfs-site.xml
> > > <?xml version="1.0"?>
> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > <!-- Put site-specific property overrides in this file. -->
> > > <configuration>
> > > <property>
> > >   <name>dfs.replication</name>
> > >   <value>2</value>
> > > </property>
> > > <property>
> > >   <name>dfs.hosts.exclude</name>
> > >   <value>/home/shangan/bin/hadoop-0.20.2/conf/exclude</value>
> > > </property>
> > > </configuration>
> > >
> > > mapred-site.xml
> > >
> > > <?xml version="1.0"?>
> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> > > <!-- Put site-specific property overrides in this file. -->
> > > <configuration>
> > >    <property>
> > >         <name>mapred.job.tracker</name>
> > >         <value>vm153:9001</value>
> > >    </property>
> > >    <property>
> > >         <name>mapred.map.tasks</name>
> > >         <value>20</value>
> > >    </property>
> > >    <property>
> > >         <name>mapred.reduce.tasks</name>
> > >         <value>5</value>
> > >    </property>
> > > </configuration>
> > >
> > > WHAT'S THE PROBLEM ?Do I need to configure other parameters, there're
> > parameters like dfs.secondary.http.address,dfs.datanode.address, the ip of
> > which is 0.0.0.0,do I need to change them ?
> > >
> > > 2010-08-18
> > >
> > >
> > >
> > > shangan
> >
> > __________ Information from ESET NOD32 Antivirus, version of virus
> > signature database 5345 (20100805) __________
> > The message was checked by ESET NOD32 Antivirus.
> > http://www.eset.com
> >
          
__________ Information from ESET NOD32 Antivirus, version of virus signature database 5345
(20100805) __________
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message