Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C3DA310389 for ; Fri, 28 Mar 2014 03:02:12 +0000 (UTC) Received: (qmail 37942 invoked by uid 500); 28 Mar 2014 01:07:20 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 37832 invoked by uid 500); 28 Mar 2014 01:07:18 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 37701 invoked by uid 99); 28 Mar 2014 01:07:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Mar 2014 01:07:16 +0000 Date: Fri, 28 Mar 2014 01:07:16 +0000 (UTC) From: "Josh Elser (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-2574) Define storage data structure for data that needs replication MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950233#comment-13950233 ] Josh Elser commented on ACCUMULO-2574: -------------------------------------- A nice property of this data structure would be the following: {noformat} hdfs://nn1:8020/accumulo/wal/tserver1:1234/uuid => { 'offset':[0,100] } {noformat} This record defines that the given WAL has updates from offset 0 to 100 that can be replicated. {noformat} hdfs://nn1:8020/accumulo/wal/tserver1:1234/uuid => {'offset':[100,200] } More data is ingested to the same WAL. By setting a combiner on the table which is storing these records, it is desired to have these records automatically merged into hdfs://nn1:8020/accumulo/wal/tserver1:1234/uuid => {'offset':[0,200] } This would allow us to update our internal view of what is ready to be replicated at a different rate of what is being actively replicated. For example, a delete to the same record could subtract from the offset needed to replicate. This would allow for intermittent failure to replicate, or server failure. The entire replication does not need to occur in one sitting. > Define storage data structure for data that needs replication > ------------------------------------------------------------- > > Key: ACCUMULO-2574 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2574 > Project: Accumulo > Issue Type: Sub-task > Reporter: Josh Elser > Fix For: 1.7.0 > > > We need to track data that needs replication. At a minimum we need to track where the data came from (to support cycles in the replication graph), optional offsets into the file that needs replicating (important for WALs to avoid having to wait for a WAL to be closed before replicating). > It might make sense to include where the data should be replicated to. Not sure if it makes sense to do that as late as possible or earlier on. -- This message was sent by Atlassian JIRA (v6.2#6252)