hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brad Sarsfield <b...@bing.com>
Subject RE: Choosing IO intensive and CPU intensive workloads
Date Fri, 09 Dec 2011 16:23:58 GMT
Hi Arun 

TestDFS IO is good;  I like "Teragen/Terasort" as a IO benchmark to help understand the IO
capabilities of your hardware and network (running at GB scale if you want to look at a single
box ).  There are a number of dials you can turn in your experiment that will reveal different
things about your setup.  

The other thing that you'll want to rationalize is the total number of tasks; a slight oversubscription
of map/redtasks to cores, depending on your workload, may be a good place to start optimization.
 Knowing what each of your hardware configurations are capable of (B1 and B2 in your case)
will allow you to help set expectations of what the box is physically able to do.

How?
Generate:  Hadoop jar hadoop-examples-xxx-.jar teragen -conf terasort.xml 100000000 10GBsort-input
Sort: hadoop jar hadoop-examples-xxx-.jar terasort -conf terasort.xml 10GBsort-input 10GBsort-output

Then in terasort.xml you can play with many values; Remember to only turn one at a time. 
10GB should work in your case
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>mapred.map.tasks</name>
    <value>25</value>
  </property>
  <property>
    <name>mapred.reduce.tasks</name>
    <value>10</value>
  </property>	
  <property>
    <name>dfs.block.size</name>
    <value>134217728</value> <!-- 536870912  ==512, 268435456 == 256, 134217728==128
-->
  </property>
.... etc

-----Original Message-----
From: alo alt [mailto:wget.null@googlemail.com] 
Sent: Friday, December 09, 2011 2:23 AM
To: common-user@hadoop.apache.org
Subject: Re: Choosing IO intensive and CPU intensive workloads

Hi Arun,

In hadoop-*test*.jar we have a lot testcases, could any of them match yours?
#> cd /usr/lib/hadoop-0.20/ && hadoop jar hadoop-*test*.jar

- Alex



On Fri, Dec 9, 2011 at 10:58 AM, ArunKumar <arunk786@gmail.com> wrote:

> Alex,
>
> To see the behavior of a single node under compute intensive benchmark 
> which params other than finish time of the jobs are available or which 
> can be considered ?
>
> Arun
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Choosing-IO-and-CPU-intensive-workl
> oads-tp3572282p3572519.html Sent from the Hadoop lucene-users mailing 
> list archive at Nabble.com.
>



--
Alexander Lorenz
http://mapredit.blogspot.com

*P **Think of the environment: please don't print this email unless you really need to.*


Mime
View raw message