Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: jira@apache.org
Date: Fri, 28 Mar 2014 01:07:16 +0000 (UTC)
From: "Josh Elser (JIRA)" <jira@apache.org>
To: notifications@accumulo.apache.org
Message-ID: <JIRA.12704156.1395967441921.21324.1395968836568@arcas>
In-Reply-To: <JIRA.12704156.1395967441921@arcas>
References: <JIRA.12704156.1395967441921@arcas>
Subject: [jira] [Commented] (ACCUMULO-2574) Define storage data structure
 for data that needs replication
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/ACCUMULO-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950233#comment-13950233 ] 

Josh Elser commented on ACCUMULO-2574:
--------------------------------------

A nice property of this data structure would be the following:

{noformat}
hdfs://nn1:8020/accumulo/wal/tserver1:1234/uuid => { 'offset':[0,100] }
{noformat}

This record defines that the given WAL has updates from offset 0 to 100 that can be replicated.

{noformat}
hdfs://nn1:8020/accumulo/wal/tserver1:1234/uuid => {'offset':[100,200] }

More data is ingested to the same WAL. By setting a combiner on the table which is storing these records, it is desired to have these records automatically merged into

hdfs://nn1:8020/accumulo/wal/tserver1:1234/uuid => {'offset':[0,200] }

This would allow us to update our internal view of what is ready to be replicated at a different rate of what is being actively replicated. For example, a delete to the same record could subtract from the offset needed to replicate. This would allow for intermittent failure to replicate, or server failure. The entire replication does not need to occur in one sitting.

> Define storage data structure for data that needs replication
> -------------------------------------------------------------
>
>                 Key: ACCUMULO-2574
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2574
>             Project: Accumulo
>          Issue Type: Sub-task
>            Reporter: Josh Elser
>             Fix For: 1.7.0
>
>
> We need to track data that needs replication. At a minimum we need to track where the data came from (to support cycles in the replication graph), optional offsets into the file that needs replicating (important for WALs to avoid having to wait for a WAL to be closed before replicating).
> It might make sense to include where the data should be replicated to. Not sure if it makes sense to do that as late as possible or earlier on.


--
This message was sent by Atlassian JIRA
(v6.2#6252)