accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ACCUMULO-2846) Need to re-use DataInputStream for reading files that need replication
Date Tue, 27 May 2014 19:32:01 GMT

     [ https://issues.apache.org/jira/browse/ACCUMULO-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Elser updated ACCUMULO-2846:
---------------------------------

    Attachment: patched-ingest-graph.jpg

Testing done with patched version where the primary doesn't perform excessive re-reading in
the common case. Ingest rate on the peer is much more stable and maintains a relative stable
ingest rate per file.

> Need to re-use DataInputStream for reading files that need replication
> ----------------------------------------------------------------------
>
>                 Key: ACCUMULO-2846
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2846
>             Project: Accumulo
>          Issue Type: Sub-task
>          Components: replication
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 1.7.0
>
>         Attachments: ingest-graph.jpg, patched-ingest-graph.jpg
>
>
> In doing multi-node tests with continuous ingest, I was watching the ingest performance
on the peer via the monitor.
> I noticed that the ingest rate had a regular pattern to it, where ingest would spike,
and then regularly decrease by a (mostly) fixed interval, flat-line, and then repeat.
> I believe each cycle on the ingest graph is the replication of a file from the primary.
The reduction in throughput is relative to the amount of time it takes to re-read the "prefix"
of the file which we already replicated. I need to push some more logic down into the AccumuloReplicaSystem
so that we can avoid that growing penalty for seeking over the data which we don't need to
re-process.
> The cost is that it pushes more complexity into the AccumuloReplicaSystem, but, I imagine
that after I write an implementation to replicate to some other system, it would become more
obvious where the common points live that can be abstracted into a common base class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message