hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "FAQ" by ToddLipcon
Date Thu, 28 May 2009 06:17:56 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by ToddLipcon:
http://wiki.apache.org/hadoop/FAQ

------------------------------------------------------------------------------
  
  You can subclass the [http://svn.apache.org/viewvc/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/OutputFormat.java?view=markup
   OutputFormat.java] class and write your own. You can look at the code of [http://svn.apache.org/viewvc/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/TextOutputFormat.java?view=markup
     TextOutputFormat] [http://svn.apache.org/viewvc/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/MultipleOutputFormat.java?view=markup
   MultipleOutputFormat.java] etc. for reference. It might be the case that you only need
to do minor changes to any of the existing Output Format classes. To do that you can just
subclass that class and override the methods you need to change.
  
+ [[BR]]
+ [[Anchor(28)]]
+ '''28. [#28 How much network bandwidth might I need between racks in a medium size (40-80
node) Hadoop cluster?]'''
+ 
+ The true answer depends on the types of jobs you're running. As a back of the envelope calculation
one might figure something like this:
+ 
+ 60 nodes total on 2 racks = 30 nodes per rack
+ Each node might process about 100MB/sec of data
+ In the case of a sort job where the intermediate data is the same size as the input data,
that means each node needs to shuffle 100MB/sec of data
+ In aggregate, each rack is then producing about 3GB/sec of data
+ However, given even reducer spread across the racks, each rack will need to send 1.5GB/sec
to reducers running on the other rack.
+ Since the connection is full duplex, that means you need 1.5GB/sec of bisection bandwidth
for this theoretical job. So that's 12Gbps.
+ 
+ However, the above calculations are probably somewhat of an upper bound. A large number
of jobs have significant data reduction during the map phase, either by some kind of filtering/selection
going on in the Mapper itself, or by good usage of Combiners. Additionally, intermediate data
compression can cut the intermediate data transfer by a significant factor. Lastly, although
your disks can probably provide 100MB sustained throughput, it's rare to see a MR job which
can sustain disk speed IO through the entire pipeline. So, I'd say my estimate is at least
a factor of 2 too high.
+ 
+ So, the simple answer is that 4-6Gbps is most likely just fine for most practical jobs.
If you want to be extra safe, many inexpensive switches can operate in a "stacked" configuration
where the bandwidth between them is essentially backplane speed. That should scale you to
96 nodes with plenty of headroom. Many inexpensive gigabit switches also have one or two 10GigE
ports which can be used effectively to connect to each other or to a 10GE core.
+ 

Mime
View raw message