hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suhail Rehman <suhailreh...@gmail.com>
Subject Re: Reducers are stuck fetching map data.
Date Tue, 26 Jan 2010 19:05:56 GMT
Yes, will be immensely helpful for others.

Suhail

On Tue, Jan 26, 2010 at 9:52 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> You mean that documentation?
>
> http://hadoop.apache.org/common/docs/r0.20.1/quickstart.html#Required+Software
>
> J-D
>
> On Tue, Jan 26, 2010 at 1:34 AM, Suhail Rehman <suhailrehman@gmail.com>
> wrote:
> > We finally figured it out! The problem was with the JDK installation on
> our
> > VMs, it was configured to use IBM JDK, and the moment we switched to Sun,
> > everything now works flawlessly.
> >
> > You may want to include this information somewhere in the documentation
> that
> > you strongly recommend Sun JDK to be used with Hadoop.
> >
> > Suhail
> >
> > On Thu, Jan 21, 2010 at 1:13 PM, Suhail Rehman <suhailrehman@gmail.com>
> > wrote:
> >>
> >> We have verified that it does NOT solve the problem at all.  This would
> >> lead us to believe that the timeout issue we are experiencing is not
> part of
> >> the shuffle phase. Any other ideas that might help us?
> >>
> >> The Tasktracker logs show that these reducers are stuck during the copy
> >> phase.
> >>
> >> Suhail
> >>
> >> On Wed, Jan 20, 2010 at 5:22 PM, Amareshwari Sri Ramadasu
> >> <amarsri@yahoo-inc.com> wrote:
> >>>
> >>> ReadTimeOuts are found to be costly during shuffle, if the map runtime
> is
> >>> high.
> >>> Please see HADOOP-3327(
> http://issues.apache.org/jira/browse/HADOOP-3327)
> >>> for shuffle improvements done for ReadTimeOut specificlly
> >>>
> >>> Thanks
> >>> Amareshwari
> >>>
> >>> On 1/20/10 6:07 PM, "Suhail Rehman" <suhailrehman@gmail.com> wrote:
> >>>
> >>> We are having trouble running Hadoop MapReduce jobs on our cluster.
> >>>
> >>> VMs running on an IBM blade center with the following virtualized
> >>> configuration:
> >>>
> >>> Master Node/Namenode: 1x
> >>> OS:                 Xen RedHat Linux 5.2, CPU : 3 vCPU, RAM: 1024 MB
> >>> Slaves/DataNode: 3x
> >>> OS:                 Xen RedHat Linux 5.2 1 vCPU, 1024 MB RAM
> >>>
> >>> We are working with standard Hadoop example code. We are using Hadoop
> >>> 0.20.1, stable with the latest patches installed. All VMs have
> firewalls
> >>> turned off as well as SELinux disabled.
> >>>
> >>> For example, while we try to execute the "wordcount" program on a
> >>> provisioned cluster, the Map operations complete successfully, the
> program
> >>> is stuck trying to complete the reduce operations.
> >>>
> >>> On examining the logs, we find that the Reducers are waiting for the
> >>> outputs from Map operations on other nodes. Our understanding is that
> this
> >>> communication happens over HTTP sockets and all these provisioned VMs
> have
> >>> trouble communicating over the HTTP sockets on the ports that Hadoop
> uses.
> >>>
> >>> Also, while trying to access the JobTracker web interface to view the
> >>> running jobs, we see that the machine is taking too much time to
> respond to
> >>> our queries. Since both of the Reducer communication and the JobTracker
> web
> >>> interface works over HTTP, we think the problem might be a networking
> issue
> >>> or a problem with the built-in HTTP service in Hadoop (Jetty).
> >>>
> >>> Attached is a partial Task log from one of the Reducers,
> >>> "WARN org.apache.hadoop.mapred.ReduceTask:
> >>> java.net.SocketTimeoutException: Read timed out"
> >>> appears on all reducers, and eventually the Job either fails to
> complete
> >>> or takes a very long time (about 15 hours to process a 11 GB text
> file).
> >>>
> >>> This problem seems to be random and at times the program runs
> sucessfully
> >>> in about 20 mins, othertimes it completes the operation in 15 hours.
> >>>
> >>> Any help with regards to this would be much appreciated.
> >>>
> >>> Regards,
> >>>
> >>> Suhail Rehman
> >>> MS by Research in Computer Science
> >>> International Institute of Information Technology - Hyderabad
> >>> rehman@research.iiit.ac.in
> >>> ---------------------------------------------------------------------
> >>> http://research.iiit.ac.in/~rehman<http://research.iiit.ac.in/%7Erehman>
> >>>
> >>
> >>
> >>
> >> --
> >> Regards,
> >>
> >> Suhail Rehman
> >> MS by Research in Computer Science
> >> International Institute of Information Technology - Hyderabad
> >> rehman@research.iiit.ac.in
> >> ---------------------------------------------------------------------
> >> http://research.iiit.ac.in/~rehman<http://research.iiit.ac.in/%7Erehman>
> >
> >
> >
> > --
> > Regards,
> >
> > Suhail Rehman
> > MS by Research in Computer Science
> > International Institute of Information Technology - Hyderabad
> > rehman@research.iiit.ac.in
> > ---------------------------------------------------------------------
> > http://research.iiit.ac.in/~rehman<http://research.iiit.ac.in/%7Erehman>
> >
>



-- 
Regards,

Suhail Rehman
MS by Research in Computer Science
International Institute of Information Technology - Hyderabad
rehman@research.iiit.ac.in
---------------------------------------------------------------------
http://research.iiit.ac.in/~rehman

Mime
View raw message