accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ke...@deenlo.com
Subject Re: Review Request 19862: Design document for review on cross-cluster replication
Date Wed, 02 Apr 2014 16:07:43 GMT


> On April 2, 2014, 3:15 p.m., Josh Elser wrote:
> > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 62
> > <https://reviews.apache.org/r/19862/diff/1/?file=543190#file543190line62>
> >
> >     Thinking about this from a total ordering standpoint. Say we're replicating
to two slaves, and we have three rfiles to replicate (1, 2 and 3) to those two slaves.
> >     
> >     We replicate rfile1 to both, but then the link to slave2 goes down. We can still
replicate rfile2 and then rfile3 to slave1, while we try to send rfile2 to slave2.
> >     
> >     What, if instead of the link being down, we happen to communicate to an angry
server inside of slave2 which never completes the transfer. We don't want to transfer rfile3
to attempt to better preserve global ordering.
> >     
> >     This can be restated as "we only want to replicate one 'file' to a slave at
a time" so that we preserve the original semantics of the replication "queue" (table). The
problem is that this could drastically slow down replication when the link between master
and slave cannot be saturated by one replication task at a time.
> >     
> >     This isn't anything that we can reliably guarantee now (without conditional
mutations), right? Is it worth trying to tackle? The one clear change I want to make is that
we do want to put the identifier for the slave in with the replication record rather than
defer determination of where a record should be replicated.

I also think transferring files should be an external concern like Mike Said.  One way this
could work is the following.

 1. Cluster A exports a batch of file uris and a control file (similar to export table)
 2. The user distcps the uris and control file
 3. The control file and dir containing distcp files is provided to cluter B to import 

The difference between this and import/export table is thats its stateful.  Export on Cluster
A provides the list of changes since the last export.  The control file contains ordering
information about how to apply the files.  The control file also contains ordering information
about other import/exports.   But this process is incomplete.  The feedback process would
need to be worked out.  The entire process should be resiliant to users trying to apply things
out of order.


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39260
-----------------------------------------------------------


On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round
one, with a few extra clarifications. New content should be posted here, but I'll maintain
the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message