accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2668) slow WAL writes
Date Tue, 15 Apr 2014 04:41:17 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969216#comment-13969216
] 

ASF subversion and git services commented on ACCUMULO-2668:
-----------------------------------------------------------

Commit e4cef7f209551ebe17e43058e182ca22f8f89293 in accumulo's branch refs/heads/1.6.0-SNAPSHOT
from [~parkjsung]
[ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e4cef7f ]

ACCUMULO-2668 Override the write method which takes a byte[] to call the efficient method
on the wrapped OutputStream

FilterOutputStream's implementation for this write method is horribly inefficient,
and causes a massive degradation in ingest performance.

Signed-off-by: Josh Elser <elserj@apache.org>


> slow WAL writes
> ---------------
>
>                 Key: ACCUMULO-2668
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2668
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: Jonathan Park
>            Assignee: Jonathan Park
>            Priority: Blocker
>              Labels: 16_qa_bug
>             Fix For: 1.6.1
>
>         Attachments: ACCUMULO-2668.0.patch.txt, noflush.diff
>
>
> During continuous ingest, we saw over 70% of our ingest time taken up by writes to the
WAL. When we ran the DfsLogger in isolation (created one outside of the Tserver), we saw about
~25MB/s throughput as opposed to nearly 100MB/s from just writing directly to an hdfs outputstream
(computed by taking the estimated size of the mutations sent to the DfsLogger class divided
by the time it took for it to flush + sync the data to HDFS).
> After investigating, we found one possible culprit was the NoFlushOutputStream. It is
a subclass of java.io.FilterOutputStream but does not override the write(byte[], int, int)
method signature. The javadoc indicates that subclasses of the FilterOutputStream should provide
a more efficient implementation.
> I've attached a small diff that illustrates and addresses the issue but this may not
be how we ultimately want to fix it.
> As a side note, I may be misreading the implementation of DfsLogger, but it looks like
we always make use of the NoFlushOutputStream, even if encryption isn't enabled. There appears
to be a faulty check in the DfsLogger.open() implementation that I don't believe can be satisfied
(line 384).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message