Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: jira@apache.org
Date: Tue, 27 May 2014 19:14:02 +0000 (UTC)
From: "Josh Elser (JIRA)" <jira@apache.org>
To: notifications@accumulo.apache.org
Message-ID: <JIRA.12716868.1401214497045.25117.1401218042181@arcas>
In-Reply-To: <JIRA.12716868.1401214497045@arcas>
References: <JIRA.12716868.1401214497045@arcas>
Subject: [jira] [Comment Edited] (ACCUMULO-2846) Need to re-use
 DataInputStream for reading files that need replication
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/ACCUMULO-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010148#comment-14010148 ] 

Josh Elser edited comment on ACCUMULO-2846 at 5/27/14 7:13 PM:
---------------------------------------------------------------

I reproduced the initial ingest graph on the peer that I saw which lead me to this problem for context.


was (Author: elserj):
I reproduced the initial graph that I saw which lead me to this problem for context.

> Need to re-use DataInputStream for reading files that need replication
> ----------------------------------------------------------------------
>
>                 Key: ACCUMULO-2846
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2846
>             Project: Accumulo
>          Issue Type: Sub-task
>          Components: replication
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 1.7.0
>
>         Attachments: ingest-graph.jpg
>
>
> In doing multi-node tests with continuous ingest, I was watching the ingest performance on the peer via the monitor.
> I noticed that the ingest rate had a regular pattern to it, where ingest would spike, and then regularly decrease by a (mostly) fixed interval, flat-line, and then repeat.
> I believe each cycle on the ingest graph is the replication of a file from the primary. The reduction in throughput is relative to the amount of time it takes to re-read the "prefix" of the file which we already replicated. I need to push some more logic down into the AccumuloReplicaSystem so that we can avoid that growing penalty for seeking over the data which we don't need to re-process.
> The cost is that it pushes more complexity into the AccumuloReplicaSystem, but, I imagine that after I write an implementation to replicate to some other system, it would become more obvious where the common points live that can be abstracted into a common base class.


--
This message was sent by Atlassian JIRA
(v6.2#6252)