hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suhail Rehman <suhailreh...@gmail.com>
Subject Re: Reducers are stuck fetching map data.
Date Tue, 26 Jan 2010 09:34:12 GMT
We finally figured it out! The problem was with the JDK installation on our
VMs, it was configured to use IBM JDK, and the moment we switched to Sun,
everything now works flawlessly.

You may want to include this information somewhere in the documentation that
you *strongly recommend *Sun JDK to be used with Hadoop.

Suhail

On Thu, Jan 21, 2010 at 1:13 PM, Suhail Rehman <suhailrehman@gmail.com>wrote:

>
> We have verified that it does NOT solve the problem at all.  This would
> lead us to believe that the timeout issue we are experiencing is not part of
> the shuffle phase. Any other ideas that might help us?
>
> The Tasktracker logs show that these reducers are stuck during the copy
> phase.
>
> Suhail
>
>
> On Wed, Jan 20, 2010 at 5:22 PM, Amareshwari Sri Ramadasu <
> amarsri@yahoo-inc.com> wrote:
>
>>  ReadTimeOuts are found to be costly during shuffle, if the map runtime
>> is high.
>> Please see HADOOP-3327( http://issues.apache.org/jira/browse/HADOOP-3327)
>> for shuffle improvements done for ReadTimeOut specificlly
>>
>> Thanks
>> Amareshwari
>>
>>
>> On 1/20/10 6:07 PM, "Suhail Rehman" <suhailrehman@gmail.com> wrote:
>>
>> We are having trouble running Hadoop MapReduce jobs on our cluster.
>>
>> VMs running on an IBM blade center with the following virtualized
>> configuration:
>>
>> Master Node/Namenode: 1x
>> OS:                 Xen RedHat Linux 5.2, CPU : 3 vCPU, RAM: 1024 MB
>> Slaves/DataNode: 3x
>> OS:                 Xen RedHat Linux 5.2 1 vCPU, 1024 MB RAM
>>
>> We are working with standard Hadoop example code. We are using Hadoop
>> 0.20.1, stable with the latest patches installed. All VMs have firewalls
>> turned off as well as SELinux disabled.
>>
>> For example, while we try to execute the "wordcount" program on a
>> provisioned cluster, the Map operations complete successfully, the program
>> is stuck trying to complete the reduce operations.
>>
>> On examining the logs, we find that the Reducers are waiting for the
>> outputs from Map operations on other nodes. Our understanding is that this
>> communication happens over HTTP sockets and all these provisioned VMs have
>> trouble communicating over the HTTP sockets on the ports that Hadoop uses.
>>
>> Also, while trying to access the JobTracker web interface to view the
>> running jobs, we see that the machine is taking too much time to respond to
>> our queries. Since both of the Reducer communication and the JobTracker web
>> interface works over HTTP, we think the problem might be a networking issue
>> or a problem with the built-in HTTP service in Hadoop (Jetty).
>>
>> Attached is a partial Task log from one of the Reducers,
>> "WARN org.apache.hadoop.mapred.ReduceTask:
>> java.net.SocketTimeoutException: Read timed out"
>> appears on all reducers, and eventually the Job either fails to complete
>> or takes a very long time (about 15 hours to process a 11 GB text file).
>>
>> This problem seems to be random and at times the program runs sucessfully
>> in about 20 mins, othertimes it completes the operation in 15 hours.
>>
>> Any help with regards to this would be much appreciated.
>>
>> Regards,
>>
>> Suhail Rehman
>> MS by Research in Computer Science
>> International Institute of Information Technology - Hyderabad
>> rehman@research.iiit.ac.in
>> ---------------------------------------------------------------------
>> http://research.iiit.ac.in/~rehman <http://research.iiit.ac.in/%7Erehman>
>>
>>
>
>
> --
> Regards,
>
> Suhail Rehman
> MS by Research in Computer Science
> International Institute of Information Technology - Hyderabad
> rehman@research.iiit.ac.in
> ---------------------------------------------------------------------
> http://research.iiit.ac.in/~rehman <http://research.iiit.ac.in/%7Erehman>
>



-- 
Regards,

Suhail Rehman
MS by Research in Computer Science
International Institute of Information Technology - Hyderabad
rehman@research.iiit.ac.in
---------------------------------------------------------------------
http://research.iiit.ac.in/~rehman

Mime
View raw message