hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4049) hflush performance regression due to nagling delays
Date Mon, 15 Oct 2012 20:55:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476435#comment-13476435
] 

Aaron T. Myers commented on HDFS-4049:
--------------------------------------

Patch looks really good to me. A few little comments on the test code. +1 once these are addressed:

# In TestMultiThreadedHflush, the repl 1 test is in fact also using repl 3:
{code}
+  public void testMultipleHflushersRepl1() throws Exception {
+    doTestMultipleHflushers(3);
+  }
+  
+  @Test
+  public void testMultipleHflushersRepl3() throws Exception {
+    doTestMultipleHflushers(3);
+  }
{code}
# I recommend changing the formal parameter name to "replication" or "numDatanodes" instead
of "i" in the following:
{code}
+  private void doTestMultipleHflushers(int i) throws Exception {
{code}
# Not changed by your patch, but I think that the constants TestMultiThreadedHflush.numBlocks
and TestMultiThreadedHflush.fileSize are unused. Mind removing them?

In addition to the code review, I also tested the patch on a 4 node physical cluster and confirmed
it worked as expected.

Prior to HDFS-3721:

{noformat}
Finished in 58757ms
Latency quantiles (in microseconds):
50.00 %ile +/- 5.00%: 847
75.00 %ile +/- 2.50%: 954
90.00 %ile +/- 1.00%: 1074
95.00 %ile +/- 0.50%: 1203
99.00 %ile +/- 0.10%: 2073
{noformat}

Post HDFS-3721:

{noformat}
Finished in 1032220ms
Latency quantiles (in microseconds):
50.00 %ile +/- 5.00%: 1756
75.00 %ile +/- 2.50%: 41004
90.00 %ile +/- 1.00%: 41231
95.00 %ile +/- 0.50%: 41400
99.00 %ile +/- 0.10%: 42684
{noformat}

With the latest patch included here:

{noformat}
Finished in 58531ms
Latency quantiles (in microseconds):
50.00 %ile +/- 5.00%: 864
75.00 %ile +/- 2.50%: 970
90.00 %ile +/- 1.00%: 1096
95.00 %ile +/- 0.50%: 1237
99.00 %ile +/- 0.10%: 2131
{noformat}
                
> hflush performance regression due to nagling delays
> ---------------------------------------------------
>
>                 Key: HDFS-4049
>                 URL: https://issues.apache.org/jira/browse/HDFS-4049
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, performance
>    Affects Versions: 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>         Attachments: hdfs-4049.txt, hdfs-4049.txt
>
>
> HDFS-3721 reworked the way that packets are mirrored through the pipeline in the datanode.
This caused two write() calls where there used to be one, which interacts badly with nagling
so that there are 40ms bubbles on hflush() calls. We didn't notice this in the tests because
the hflush perf test only uses a single datanode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message