accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Drob" <md...@mdrob.com>
Subject Re: Review Request 19862: Design document for review on cross-cluster replication
Date Wed, 02 Apr 2014 17:01:40 GMT


> On April 2, 2014, 3:36 p.m., Mike Drob wrote:
> >
> 
> Mike Drob wrote:
>     Huh, RB proxy error ate my comment.
>     
>     I was speaking to some of the HBase team about this yesterday, and they mentioned
that they do not support replicated bulk import. Their recommended solution is just to externally
copy files and run bulk import on the slave. Since this is something that is possible for
users to configure themselves, I'd like to make sure we focus on the difficult case of like
ingest.
>     
>     Is the assumption that replication is an all-or-nothing deal? Either you replicate
all of the tables on a system, or you replicate none of them, but just a defined set is not
allowed? I believe the WAL groups mutations by table IDs, so care would need to be taken to
make sure those do not get out of sync.
>     
>     What happens when I clone a table, for example when running an offline MR job. does
the clone need to be replicated? I assume no. If the slave is a read-only implementation,
can I make clones there to run MR? Maybe another thing that will come out of this is 'transient
clones' that have IDs in a reserved high range that can be reused after they are deleted.
>     
>
> 
> Josh Elser wrote:
>     I believe I already said elsewhere that replication is on a per-table basis. Replication
for tables would (likely) have to be turned on, at which point the offline-MR case isn't a
worry.
> 
> kturner wrote:
>     Why not support replicating bulk imports?  Seems like it makes things easier on users.
> 
> Mike Drob wrote:
>     Then the ID mapping is a worry.
> 
> Josh Elser wrote:
>     When configuring the replication, we would just track the source tableID and the
destination cluster and the destination tableID. Am I missing something?

If we're shipping WALs around, then the slave has to know the mapping from source table ID
to destination table ID. Then you need to have an extra code path that checks for a mapping
before performing "recovery."

If we have cyclic replication, then you have to know which WAL you are shipping, because that
could imply a different mapping. Master table x maps to slave table y maps to other slave
table z. If we have master-master, then both sides need to know the mapping, so I guess the
table needs to exist on both clusters before replication can be configured (so that we have
a table ID to use in the configuration).

Also, if we're shipping WALs around, then it is possible that you have 99 mutations for a
table that isn't replicated and 1 mutation that is replpicated. Sending offsets and chunks
can help minimize the bandwidth, but...


- Mike


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19862/#review39264
-----------------------------------------------------------


On April 1, 2014, 1:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19862/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 1:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Re-posting a version of the design doc that I own. Contains grammatical fixes from round
one, with a few extra clarifications. New content should be posted here, but I'll maintain
the old review as discussion progresses.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Josh Elser
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message