Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Message-ID: <26209994.1194001790916.JavaMail.jira@brutus>
Date: Fri, 2 Nov 2007 04:09:50 -0700 (PDT)
From: "Konstantin Shvachko (JIRA)" <jira@apache.org>
To: hadoop-dev@lucene.apache.org
Subject: [jira] Commented: (HADOOP-2000) Re-write NNBench to use MapReduce
In-Reply-To: <15554700.1191620390866.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539565 ] 

Konstantin Shvachko commented on HADOOP-2000:
---------------------------------------------

# redundant imports
import java.text.DateFormat;
import org.apache.hadoop.mapred.Reducer;
# variable name in NNBenchMapper.map() is never used.
# Typo
{code}
    // Set user-dfined parameters,
{code}
# Printing TPS calculating TPmS. Should be the same:
{code}
    "       RAW DATA: TPS Total : " + totalTimeTPmS,
{code}
# double totalTimeTPS is confusing, since it is in fact TPS, not time according to the formula and the comments
# I am not happy with the whole concept of transactions per second.
So you measure total execution time of each map (t_i) and then divide Number_of_files / Sum(t_i).
But the Sum(t_i) is not the right time, because maps are running in parallel,
so in order to obtain the true TPS you need to time the start and the end of +*all*+ maps 
rather than the start and the end of +*individual*+ maps.
But it is hard to get the exact starting and ending times of the job's map stage.
Your proposed TPS measures the # of transactions per second of a single client under a certain load on the cluster.
This is not completely unreasonable, but does not say much as a benchmark result imo.
I mean it is quite clear that if the cluster bears more load the clients run slower.

> Re-write NNBench to use MapReduce
> ---------------------------------
>
>                 Key: HADOOP-2000
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2000
>             Project: Hadoop
>          Issue Type: Test
>          Components: test
>    Affects Versions: 0.15.0
>            Reporter: Mukund Madhugiri
>            Assignee: Mukund Madhugiri
>             Fix For: 0.16.0
>
>         Attachments: HADOOP-2000.patch, HADOOP-2000.patch, HADOOP-2000.patch, HADOOP-2000.patch, HADOOP-2000.patch
>
>
> The proposal is to re-write the NNBench benchmark/test to measure Namenode operations using MapReduce. Two buckets of measurements will be done:
> 1. Transactions per second 
> 2. Average latency
> for these operations
> - Create and Close file
> - Open file
> - Rename file
> - Delete file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.