hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From venkata subbarayudu <avsrit2...@gmail.com>
Subject Hadoop : Too many fetch failures -- Reducer doesn't start
Date Thu, 05 Nov 2009 12:27:57 GMT
Hi All
- I have setup a 'single node hadoop' (hadoop-version 0.20.0)  one localhost
by following the instructions from '
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)'
.  (i.e. master&slave nodes are 'localhost'.)

I was able to run MapReduce tasks with no problems in a standalone system,
and
now I am trying to setup the same on a different system, and this system has
two IP addresses, the new setup looks fine (because all the hadoop processes
like datanode, namenode, secondarynamenode, tasktracker, jobtracker were
started). and I'm able to run Hadoop Jobs that have only Mapper Tasks. but
for Jobs that have Map&Reduce tasks, Hadoop Map Task is throwing '*Too Many
fetch failures*', exception.


Can Somebody please suggest and give any insights on how the problem can be
resolved.
---------------------------------------------------------------------------------------------------------------------------------------------------
I think, the reason for this exception is that, some how the Reducer task is
not able to read o/p of MapperTask, and this communication happens over a
dns that was specified by the following properties in hadoop-spec [Please
see the config file below].

           *hdfs-site.xml*

           <property>
           <name>dfs.datanode.dns.interface</name>
                     <value>default</value>
                     <description>The name of the Network Interface from
which a data node should  report its IP address. </description>
           </property>

         <property>
                 <name>dfs.datanode.dns.nameserver</name>
                 <value>default</value>
                 <description>The host name or IP address of the name server
(DNS)
                      which a DataNode should use to determine the host name
used by the
                      NameNode for communication and display purposes.
                </description>
         </property>

        *mapred-site.xml*

        <property>
                 <name>mapred.tasktracker.dns.interface</name>
                 <value>default</value>
                <description>The name of the Network Interface from which a
task tracker should report its IP address. </description>
        </property>

        <property>
              <name>mapred.tasktracker.dns.nameserver</name>
              <value>default</value>
              <description>The host name or IP address of the name server
(DNS)
                          which a TaskTracker should use to determine the
host name used by
                          the JobTracker for communication and display
purposes.
             </description>
       </property>
-----------------------------------------------------------------------------------------------------------------------------------------------------------

Our Server has 2 IP addresses and the hadoop-user can access the system
using 2nd IP. I believe, As the value specified above is "default" -->
hadoop gets the localhost hostname and uses this(IP1) for further
communication.As the hadoop-user is not allowed to access the system using
the IP1, the communication fails, and the Map/Reduce Task is not able to
report/read the hdfs-data. which caused Hadoop to throw 'Too Many Fetch
failures' exception.

please correct me If the above explanation is incorrect, quick reply is much
appreciated.

Thanks,
Rayudu.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message