hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Saile <da...@uni-koblenz.de>
Subject Re: Benchmarking pipelined MapReduce jobs
Date Thu, 24 Feb 2011 11:28:14 GMT
Thanks for your help! 

I had a look at the gridmix_config.xml file in the gridmix2 directory. However, I'm having
difficulties to map the descriptions of the simulated jobs from the README-file
1) Three stage map/reduce job
2) Large sort of variable key/value size
3) Reference select
4) API text sort (java, streaming)
5) Jobs with combiner (word count jobs)

to the jobs names in gridmix_config.xml: 
-streamSort
-javaSort
-combiner
-monsterQuery
-webdataScan
-webdataSort	

I would really appreciate any help, getting the right configuration! Which job do I have to
enable to simulate a pipelined execution as described in "1) Three stage map/reduce job"?

Thanks
David 

Am 23.02.2011 um 04:01 schrieb Shrinivas Joshi:

> I am not sure about this but you might want to take a look at the GridMix config file.
FWIU, it lets you define the # of jobs for different workloads and categories.
> 
> HTH,
> -Shrinivas
> 
> On Tue, Feb 22, 2011 at 10:46 AM, David Saile <david@uni-koblenz.de> wrote:
> Hello everybody,
> 
> I am trying to benchmark a Hadoop-cluster with regards to throughput of pipelined MapReduce
jobs.
> Looking for benchmarks, I found the "Gridmix" benchmark that is supplied with Hadoop.
In its README-file it says that part of this benchmark is a "Three stage map/reduce job".
> 
> As this seems to match my needs, I was wondering if it possible to configure "Gridmix",
in order to only run this job (without the rest of the "Gridmix" benchmark)?
> Or do I have to build my own benchmark? If this is the case, which classes are used by
this "Three stage map/reduce job"?
> 
> Thanks for any help!
> 
> David
> 
>  
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message