hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From felix gao <gre1...@gmail.com>
Subject Re: Job takes a very long time to complete
Date Thu, 14 Jul 2011 21:22:56 GMT
Bobby,

Thanks for the information. It is the resolver that is making it slow.
After we put in the ip to host mapping in /etc/hosts file. everything took
off like a space shuttle.

Felix

On Thu, Jul 14, 2011 at 1:58 PM, Robert Evans <evans@yahoo-inc.com> wrote:

>  Felix,
>
> I am not an expert on networking by any means, but BGP is Border Gateway
> Protocol.  It is used to help a router decided the best way to get the
> packets to where they need to be.  If it is wrong then your packets could be
> taking the long way from one box to another.  Have you tried running any
> networking benchmark tests, even just ping or talking to your hosting
> company about it?  It looks like HDFS is very slow, which is probably
> because the network is slow.  The network can be slow for all kinds of
> reasons, and your hosting company is probably in the best position to help
> you debug it.
>
> --Bobby
>
>
> On 7/14/11 3:45 PM, "felix gao" <gre1600@gmail.com> wrote:
>
> we didn't do anything on the cluster end, the company hosted our cluster
> did a  BGP update(what ever that means) and full reset. (I think just reboot
> of the switches)
>
> On Thu, Jul 14, 2011 at 1:27 PM, Robert Evans <evans@yahoo-inc.com> wrote:
>
> Felix,
>
> So did you change anything except the network configuration?  What did you
> do to fix the “networking issues”?
>
> --Bobby Evans
>
>
> On 7/14/11 2:46 PM, "felix gao" <gre1600@gmail.com <
> http://gre1600@gmail.com> > wrote:
>
> recently we had some network issues with our cluster.  this job used to
> take on few minute to complete and how it is taking over half hour.
>
> when looking at the jobtracker's log i see it slowly getting all the splits
> information (the list is not exhaustive)
> 2011-07-14 14:42:51,434 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0005_m_002488 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:42:56,465 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0005_m_002489 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:43:01,446 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0019_m_000218 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:43:01,466 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0010_m_001703 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:43:01,490 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0005_m_002489 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:43:06,469 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0010_m_001703 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:43:06,473 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0019_m_000218 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:43:06,473 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0019_m_000219 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:43:06,473 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0019_m_000219 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:43:11,500 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0019_m_000220 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:43:11,542 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0005_m_002491 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:43:16,526 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0019_m_000224 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:43:16,526 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0019_m_000225 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:43:16,567 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0005_m_002491 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:45:26,791 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0025_m_000001 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:45:28,696 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0005_m_002509 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:45:31,770 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0010_m_001722 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
> 2011-07-14 14:45:31,815 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201107141056_0025_m_000002 has split on node:/default-rack/x.com<
> http://x.com>  <http://x.com>
>
>
> 250 mappers tooks about 25 min to run, 10min spent on generating the
> tasks.  The question is what could have caused this slow down?
>
> Thanks,
>
> Felix
>
>
>
>

Mime
View raw message