Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3A34B10AD7 for ; Tue, 15 Apr 2014 04:39:21 +0000 (UTC) Received: (qmail 71202 invoked by uid 500); 15 Apr 2014 04:39:16 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 71118 invoked by uid 500); 15 Apr 2014 04:39:15 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 71107 invoked by uid 99); 15 Apr 2014 04:39:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Apr 2014 04:39:14 +0000 Date: Tue, 15 Apr 2014 04:39:14 +0000 (UTC) From: "Jonathan Park (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (ACCUMULO-2668) slow WAL writes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2668: ------------------------------------ Attachment: ACCUMULO-2668.0.patch.txt Reuploading file from format-patch command microbenchmark: - Ran continuous ingest on my laptop (2013 mbp: 2.6 GHz quad core i7, 16 GB RAM) using default 3GB accumulo config using native maps. Used a single continuous ingester instance against a table with 4 tablets. results: with fix: 120K entries/s, 12.66 MB/s without fix: 83K entries/s, 9.05 MB/s #s were obtained at some point in time during the ingest > slow WAL writes > --------------- > > Key: ACCUMULO-2668 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2668 > Project: Accumulo > Issue Type: Bug > Affects Versions: 1.6.0 > Reporter: Jonathan Park > Assignee: Jonathan Park > Priority: Blocker > Labels: 16_qa_bug > Fix For: 1.6.1 > > Attachments: ACCUMULO-2668.0.patch.txt, noflush.diff > > > During continuous ingest, we saw over 70% of our ingest time taken up by writes to the WAL. When we ran the DfsLogger in isolation (created one outside of the Tserver), we saw about ~25MB/s throughput as opposed to nearly 100MB/s from just writing directly to an hdfs outputstream (computed by taking the estimated size of the mutations sent to the DfsLogger class divided by the time it took for it to flush + sync the data to HDFS). > After investigating, we found one possible culprit was the NoFlushOutputStream. It is a subclass of java.io.FilterOutputStream but does not override the write(byte[], int, int) method signature. The javadoc indicates that subclasses of the FilterOutputStream should provide a more efficient implementation. > I've attached a small diff that illustrates and addresses the issue but this may not be how we ultimately want to fix it. > As a side note, I may be misreading the implementation of DfsLogger, but it looks like we always make use of the NoFlushOutputStream, even if encryption isn't enabled. There appears to be a faulty check in the DfsLogger.open() implementation that I don't believe can be satisfied (line 384). -- This message was sent by Atlassian JIRA (v6.2#6252)