Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 64983 invoked from network); 6 Aug 2006 09:23:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 6 Aug 2006 09:23:02 -0000 Received: (qmail 3329 invoked by uid 500); 6 Aug 2006 09:23:01 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 3295 invoked by uid 500); 6 Aug 2006 09:23:01 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 3286 invoked by uid 99); 6 Aug 2006 09:23:01 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Aug 2006 02:23:01 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Aug 2006 02:22:59 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E17D6714219 for ; Sun, 6 Aug 2006 09:20:14 +0000 (GMT) Message-ID: <2008087.1154856014920.JavaMail.jira@brutus> Date: Sun, 6 Aug 2006 02:20:14 -0700 (PDT) From: "Sanjay Dahiya (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-307) Many small jobs benchmark for MapReduce In-Reply-To: <13009062.1150636529837.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-307?page=all ] Sanjay Dahiya updated HADOOP-307: --------------------------------- Attachment: patch.txt Patch for classpath issues. The benchmark can now be run using hadoop script without having to set any extra classpath - $HADOOP_HOME/bin/hadoop jar smallJobsBenchmark . See Readme.txt for an example of options. bin/run.sh script can be used as an optional helper script if benchmark needs to be run multiple times with different input configurations. thanks Uros for pointing this out. > Many small jobs benchmark for MapReduce > --------------------------------------- > > Key: HADOOP-307 > URL: http://issues.apache.org/jira/browse/HADOOP-307 > Project: Hadoop > Issue Type: Task > Components: mapred > Reporter: Sanjay Dahiya > Assigned To: Sanjay Dahiya > Priority: Minor > Fix For: 0.5.0 > > Attachments: patch.txt, patch.txt, patch.txt > > > A benchmark that runs many small MapReduce tasks in sequence. A single map reduce implementation is used, it is invoked multiple times with input as the output from previous run. The input to first Map is a TextInputFormat ( a text file with few hundred KBs). Input records are passed to output without much processing. The idea is to benchmark the time taken by initialization of Mapper and Reducer. An initial prototyping on a single machine with 20 MR tasks in sequence took ~47 seconds per task. Looking for suggestions on what else can be included in the benchmark. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira