accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Created] (ACCUMULO-2846) Need to re-use DataInputStream for reading files that need replication
Date Tue, 27 May 2014 18:16:04 GMT
Josh Elser created ACCUMULO-2846:

             Summary: Need to re-use DataInputStream for reading files that need replication
                 Key: ACCUMULO-2846
             Project: Accumulo
          Issue Type: Sub-task
          Components: replication
            Reporter: Josh Elser
            Assignee: Josh Elser
             Fix For: 1.7.0

In doing multi-node tests with continuous ingest, I was watching the ingest performance on
the peer via the monitor.

I noticed that the ingest rate had a regular pattern to it, where ingest would spike, and
then regularly decrease by a (mostly) fixed interval, flat-line, and then repeat.

I believe each cycle on the ingest graph is the replication of a file from the primary. The
reduction in throughput is relative to the amount of time it takes to re-read the "prefix"
of the file which we already replicated. I need to push some more logic down into the AccumuloReplicaSystem
so that we can avoid that growing penalty for seeking over the data which we don't need to

The cost is that it pushes more complexity into the AccumuloReplicaSystem, but, I imagine
that after I write an implementation to replicate to some other system, it would become more
obvious where the common points live that can be abstracted into a common base class.

This message was sent by Atlassian JIRA

View raw message