accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser" <josh.el...@gmail.com>
Subject Re: Review Request 19790: ACCUMULO-378 Design document
Date Mon, 31 Mar 2014 16:39:56 GMT


> On March 31, 2014, 4:21 p.m., kturner wrote:
> > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 34
> > <https://reviews.apache.org/r/19790/diff/1/?file=539855#file539855line34>
> >
> >     Can a table be replicated to multiple clusters?
> 
> kturner wrote:
>     More specifically, can a table on one cluster be replicated to multiple cluster directly.
 The graph described seemed to only imply one outgoing edge.  I am just wondering about multiple
outgoing edges from a single cluster.   It seems like this would implact the implementation
of book keeping for what files were replicated where.

No, the intent was to support replication from one cluster to N clusters. We could make this
detail transparent by including the destination in the table that we store references data
to be replicated at the cost of storing N*M records instead of just M records. N is the number
of clusters the source is replicating to while M is the number of references to data that
needs to be replicated. The more I think about it, the more I think it's definitely worth
it.


> On March 31, 2014, 4:21 p.m., kturner wrote:
> > docs/src/main/resources/design/ACCUMULO-378-design.mdtext, line 80
> > <https://reviews.apache.org/r/19790/diff/1/?file=539855#file539855line80>
> >
> >     Whats the rational for replicating WAL as opposed to replicating minor compacted
rfiles?  What are the pros and cons? One con w/ WALs is that they could possibly contain a
lot of data for tables that are not being replicated.  This data would need to be filtered.

The biggest issue is for using them is that they drastically reduce the latency for data to
*begin* the replication process. We certainly could use RFiles for everything which would
simplify things, but I'm worried about the latency that would incur. If we used RFiles, the
only solution I can come up with to speed up that latency before replication even begins would
be to increase the minc's frequency. Maybe that's sufficient for a first-pass? I think I need
to quantify this opinions with some numbers.

Right now, we tend to recommend a bigger in-memory map for increased ingest performance. The
worry here would be that recommendation now comes with increased replication latency.


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19790/#review39051
-----------------------------------------------------------


On March 28, 2014, 5:54 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19790/
> -----------------------------------------------------------
> 
> (Updated March 28, 2014, 5:54 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-378
>     https://issues.apache.org/jira/browse/ACCUMULO-378
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> ACCUMULO-378 Design document.  Posting for review here, not meant for commit.  Final
version of document should be posted on issue.
> 
> 
> Diffs
> -----
> 
>   docs/src/main/resources/design/ACCUMULO-378-design.mdtext PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19790/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message