accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ACCUMULO-2574) Define storage data structure for data that needs replication
Date Fri, 28 Mar 2014 01:09:15 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950233#comment-13950233
] 

Josh Elser edited comment on ACCUMULO-2574 at 3/28/14 1:07 AM:
---------------------------------------------------------------

A nice property of this data structure would be the following:

{noformat}
hdfs://nn1:8020/accumulo/wal/tserver1:1234/uuid => { 'offset':[0,100] }
{noformat}

This record defines that the given WAL has updates from offset 0 to 100 that can be replicated.

{noformat}
hdfs://nn1:8020/accumulo/wal/tserver1:1234/uuid => {'offset':[100,200] }
{noformat}

More data is ingested to the same WAL. By setting a combiner on the table which is storing
these records, it is desired to have these records automatically merged into


{noformat}
hdfs://nn1:8020/accumulo/wal/tserver1:1234/uuid => {'offset':[0,200] }
{noformat}

This would allow us to update our internal view of what is ready to be replicated at a different
rate of what is being actively replicated. For example, a delete to the same record could
subtract from the offset needed to replicate. This would allow for intermittent failure to
replicate, or server failure. The entire replication does not need to occur in one sitting.


was (Author: elserj):
A nice property of this data structure would be the following:

{noformat}
hdfs://nn1:8020/accumulo/wal/tserver1:1234/uuid => { 'offset':[0,100] }
{noformat}

This record defines that the given WAL has updates from offset 0 to 100 that can be replicated.

{noformat}
hdfs://nn1:8020/accumulo/wal/tserver1:1234/uuid => {'offset':[100,200] }

More data is ingested to the same WAL. By setting a combiner on the table which is storing
these records, it is desired to have these records automatically merged into

hdfs://nn1:8020/accumulo/wal/tserver1:1234/uuid => {'offset':[0,200] }

This would allow us to update our internal view of what is ready to be replicated at a different
rate of what is being actively replicated. For example, a delete to the same record could
subtract from the offset needed to replicate. This would allow for intermittent failure to
replicate, or server failure. The entire replication does not need to occur in one sitting.

> Define storage data structure for data that needs replication
> -------------------------------------------------------------
>
>                 Key: ACCUMULO-2574
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2574
>             Project: Accumulo
>          Issue Type: Sub-task
>            Reporter: Josh Elser
>             Fix For: 1.7.0
>
>
> We need to track data that needs replication. At a minimum we need to track where the
data came from (to support cycles in the replication graph), optional offsets into the file
that needs replicating (important for WALs to avoid having to wait for a WAL to be closed
before replicating).
> It might make sense to include where the data should be replicated to. Not sure if it
makes sense to do that as late as possible or earlier on.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message