accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Accumulo 1.7 and Data Center Replication
Date Fri, 27 Jun 2014 02:48:48 GMT
Hi Joe,

I'm the guy to ask if you'd like more information about the replication 
feature. You already found the parent ticket, so that has a bunch of 
technical "what's been done".

At a high level, replication was implemented as a framework in Accumulo 
to copy data that was written to a table to another "location". The 
provided initial implementation is to replicate the data as-is to 
another Accumulo table (usually some other Accumulo instance). You'll 
also find a new page in the monitor some basic administration tools in 
the code via Instance#replicationOperations.

I've published a recent version of the user manual[1] which goes into 
some more detail on the feature, as well as how to configure it.

You can also check the replication component on JIRA [2] to see what I 
have lined up. Automatically replicating bulk-loaded files will be a bit 
of work. There are some other minor things that could be improved. We 
can delve into the more technical implementation difficulties if you'd like.

I've written a basic test to evaluate equivalence by generating a Merkle 
tree for two tables. This has been promising so far, but it currently is 
living in my Github[3]. I need to figure out where/how best to include 
it in Apache.

Finally, having resources to do a larger-scale test would be great, and 
testing failure conditions over multiple nodes is probably the biggest 
area that needs to be tested more. I can simulate this on a small scale, 
but I don't have the resources to do an appropriate larger test with 
injected failure.

If you have something specific you'd like to help out with, I'd be happy 
to work with you.

<employer-hat>This feature will also be included in the next version of 
Accumulo shipped in HDP</employer-hat>

- Josh

[1] http://people.apache.org/~elserj/accumulo_user_manual.html#_replication
[2] 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ACCUMULO%20AND%20component%20%3D%20replication%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC%2C%20key%20DESC
[3] https://github.com/joshelser/merkle

On 6/26/14, 8:50 PM, Joe Stein wrote:
> Hi, I was hoping to get some more info around the 1.7 release and what are
> the to-be-dos and plans around it?
>
> Is there any help that is needed from a contribution perspective in
> anyways? Testing? Documentation? Pending coding or such?
>
> We are going to be rolling trunk into two of our lab environments
> specifically for https://issues.apache.org/jira/browse/ACCUMULO-378 as it
> is a requirement for one of my projects at Bloomberg for Accumulo to have
> data center replication before we go live.   This works is going to be over
> the next month(s) with lots of cycles dedicated to Accumulo 1.7 in the next
> few sprints.
>
> Also, I wanted to reach out if folks are looking for full time, contract or
> even side work with Accumulo. We have projects right now going on and are
> looking for more hands on keyboards.
>
> Anyways, thanks for all the great work!!!! I am looking forward to more
> continued success with the system, more integrations and to be able to
> become more active in the community.
>
> /*******************************************
>   Joe Stein
>   Founder, Principal Consultant
>   Big Data Open Source Security LLC
>   http://www.stealth.ly
>   Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>

Mime
View raw message