hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Job takes a very long time to complete
Date Thu, 14 Jul 2011 20:58:59 GMT
Felix,

I am not an expert on networking by any means, but BGP is Border Gateway Protocol.  It is
used to help a router decided the best way to get the packets to where they need to be.  If
it is wrong then your packets could be taking the long way from one box to another.  Have
you tried running any networking benchmark tests, even just ping or talking to your hosting
company about it?  It looks like HDFS is very slow, which is probably because the network
is slow.  The network can be slow for all kinds of reasons, and your hosting company is probably
in the best position to help you debug it.

--Bobby

On 7/14/11 3:45 PM, "felix gao" <gre1600@gmail.com> wrote:

we didn't do anything on the cluster end, the company hosted our cluster did a  BGP update(what
ever that means) and full reset. (I think just reboot of the switches)

On Thu, Jul 14, 2011 at 1:27 PM, Robert Evans <evans@yahoo-inc.com> wrote:
Felix,

So did you change anything except the network configuration?  What did you do to fix the "networking
issues"?

--Bobby Evans


On 7/14/11 2:46 PM, "felix gao" <gre1600@gmail.com <http://gre1600@gmail.com> >
wrote:

recently we had some network issues with our cluster.  this job used to take on few minute
to complete and how it is taking over half hour.

when looking at the jobtracker's log i see it slowly getting all the splits information (the
list is not exhaustive)
2011-07-14 14:42:51,434 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0005_m_002488
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:42:56,465 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0005_m_002489
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:43:01,446 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0019_m_000218
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:43:01,466 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0010_m_001703
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:43:01,490 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0005_m_002489
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:43:06,469 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0010_m_001703
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:43:06,473 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0019_m_000218
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:43:06,473 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0019_m_000219
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:43:06,473 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0019_m_000219
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:43:11,500 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0019_m_000220
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:43:11,542 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0005_m_002491
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:43:16,526 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0019_m_000224
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:43:16,526 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0019_m_000225
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:43:16,567 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0005_m_002491
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:45:26,791 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0025_m_000001
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:45:28,696 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0005_m_002509
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:45:31,770 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0010_m_001722
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>
2011-07-14 14:45:31,815 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0025_m_000002
has split on node:/default-rack/x.com <http://x.com>  <http://x.com>


250 mappers tooks about 25 min to run, 10min spent on generating the tasks.  The question
is what could have caused this slow down?

Thanks,

Felix




Mime
View raw message