hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Dahiya (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-307) Many small jobs benchmark for MapReduce
Date Thu, 29 Jun 2006 12:45:30 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-307?page=comments#action_12418458 ] 

Sanjay Dahiya commented on HADOOP-307:

Configuration options are - 
    input ( local file ), 
    output - DFS Path, 
    times ( no of times to execute the job ) 
    jarFile ( Mapper & Reducer )
    wordDir ( temp output from intermediate tasks) 
    maps ( num Maps)
    reduces ( num Reduces) 
I am not yet validating the bytes but I will add that. Also number of map and reduce tasks
can be configured, its passed to JobConf . The benchmark sets up multiple MapReduce tasks
in sequence and output of each job is passed as input to next execution of same job). Its
using a TextInputFormat by default and thats not configurable yet.

I was sick and out so delay in response. I am yet to run on a cluster, by tomorrow I should
post the results. 

> Many small jobs benchmark for MapReduce
> ---------------------------------------
>          Key: HADOOP-307
>          URL: http://issues.apache.org/jira/browse/HADOOP-307
>      Project: Hadoop
>         Type: Task

>   Components: mapred
>     Reporter: Sanjay Dahiya
>     Priority: Minor

> A benchmark that runs many small MapReduce tasks in sequence. A single map reduce implementation
is used, it is invoked multiple times with input as the output from previous run. The input
to first Map is a TextInputFormat ( a text file with few hundred KBs). Input records are passed
to output without much processing. The idea is to benchmark the time taken by initialization
of Mapper and Reducer. An initial prototyping on a single machine with 20 MR tasks in sequence
took ~47 seconds per task. Looking for suggestions on what else can be included in the benchmark.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message