hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rakesh R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8968) New benchmark throughput tool for striping erasure coding
Date Wed, 04 Nov 2015 19:37:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990244#comment-14990244

Rakesh R commented on HDFS-8968:

Good work [~lirui]. I've few comments, please take a look at it.

# Can we make this configurable like {{System.getProperty("test.benchmark.data","/tmp/benchmark/data"));}}
private static final String DFS_TMP_DIR = "/tmp/benchmark";
# {{printUsage}} can be highlighted using {{System.err.println}}. Also, we can say {{"Usage:
System.out.println("ErasureCodeBenchmarkThroughput <read|write|gen|clean> "
        + "<size in MB> <ec|rep> [num clients] [stf|pos]\n" +
        "Stateful and positional option is only available for read.");
# It would be good to use hadoop utility {{StopWatch}} for the elapsed time computations.
Presently its using {{System.currentTimeMillis() - start) / 1000.0}}.
Sample usage:
    org.apache.hadoop.util.StopWatch sw = new StopWatch().start();
    // do the operation
    long elapsedtime = sw.now(TimeUnit.SECONDS);
# Just a suggestion to use {{java.util.concurrent.ExecutorCompletionService}} here rather
than trying to find out which task has completed.
+    for (Future<Long> future : futures) {
+      results.add(future.get());
+    }

bq. As to unit test, maybe I can add a test where the tool runs against a MiniDFSCluster.
How about running both a real cluster and a MiniDFSCluster inside the ErasureCodeBenchmarkThroughput
tool, similar to the {{org.apache.hadoop.hdfs.BenchmarkThroughput}}?

> New benchmark throughput tool for striping erasure coding
> ---------------------------------------------------------
>                 Key: HDFS-8968
>                 URL: https://issues.apache.org/jira/browse/HDFS-8968
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Rui Li
>         Attachments: HDFS-8968-HDFS-7285.1.patch, HDFS-8968-HDFS-7285.2.patch, HDFS-8968.3.patch
> We need a new benchmark tool to measure the throughput of client writing and reading
considering cases or factors:
> * 3-replica or striping;
> * write or read, stateful read or positional read;
> * which erasure coder;
> * striping cell size;
> * concurrent readers/writers using processes or threads.
> The tool should be easy to use and better to avoid unnecessary local environment impact,
like local disk.

This message was sent by Atlassian JIRA

View raw message