accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Stein <joe.st...@stealth.ly>
Subject Re: Accumulo 1.7 and Data Center Replication
Date Fri, 27 Jun 2014 16:32:02 GMT
Hey Josh, the user manual is really great and helpful as a starting point!!!

Have you thought of / considered what an active/active solution might look
like?

One thought I had (perhaps naive don't know though) is having a table for
each data center. and then some iterators to materialize that on query and
compacted perhaps somewhere/how.

This way each data center can operate active / active and folks can get
queries from both data center tables (from a data perspective so DC 1
replicates the DC1 table to DC2 and DC2 replicates the DC2 table to DC1).

As far as testing goes with resources for more testing  can you be more
specific? Are you talking about 3 servers, 30 servers, 300 servers?

<employer-hat>
  <client-feather>
    We are currently using HDP 2.1 , fwiw
http://hortonworks.com/customer/bloomberg/ so that is great news, what is
the scheduled release date?
  </client-feather>
<employer-hat>

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/


On Thu, Jun 26, 2014 at 10:48 PM, Josh Elser <josh.elser@gmail.com> wrote:

> Hi Joe,
>
> I'm the guy to ask if you'd like more information about the replication
> feature. You already found the parent ticket, so that has a bunch of
> technical "what's been done".
>
> At a high level, replication was implemented as a framework in Accumulo to
> copy data that was written to a table to another "location". The provided
> initial implementation is to replicate the data as-is to another Accumulo
> table (usually some other Accumulo instance). You'll also find a new page
> in the monitor some basic administration tools in the code via Instance#
> replicationOperations.
>
> I've published a recent version of the user manual[1] which goes into some
> more detail on the feature, as well as how to configure it.
>
> You can also check the replication component on JIRA [2] to see what I
> have lined up. Automatically replicating bulk-loaded files will be a bit of
> work. There are some other minor things that could be improved. We can
> delve into the more technical implementation difficulties if you'd like.
>
> I've written a basic test to evaluate equivalence by generating a Merkle
> tree for two tables. This has been promising so far, but it currently is
> living in my Github[3]. I need to figure out where/how best to include it
> in Apache.
>
> Finally, having resources to do a larger-scale test would be great, and
> testing failure conditions over multiple nodes is probably the biggest area
> that needs to be tested more. I can simulate this on a small scale, but I
> don't have the resources to do an appropriate larger test with injected
> failure.
>
> If you have something specific you'd like to help out with, I'd be happy
> to work with you.
>
> <employer-hat>This feature will also be included in the next version of
> Accumulo shipped in HDP</employer-hat>
>
> - Josh
>
> [1] http://people.apache.org/~elserj/accumulo_user_manual.
> html#_replication
> [2] https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20ACCUMULO%20AND%20component%20%3D%20replication%20AND%
> 20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC%2C%20key%
> 20DESC
> [3] https://github.com/joshelser/merkle
>
>
> On 6/26/14, 8:50 PM, Joe Stein wrote:
>
>> Hi, I was hoping to get some more info around the 1.7 release and what are
>> the to-be-dos and plans around it?
>>
>> Is there any help that is needed from a contribution perspective in
>> anyways? Testing? Documentation? Pending coding or such?
>>
>> We are going to be rolling trunk into two of our lab environments
>> specifically for https://issues.apache.org/jira/browse/ACCUMULO-378 as it
>> is a requirement for one of my projects at Bloomberg for Accumulo to have
>> data center replication before we go live.   This works is going to be
>> over
>> the next month(s) with lots of cycles dedicated to Accumulo 1.7 in the
>> next
>> few sprints.
>>
>> Also, I wanted to reach out if folks are looking for full time, contract
>> or
>> even side work with Accumulo. We have projects right now going on and are
>> looking for more hands on keyboards.
>>
>> Anyways, thanks for all the great work!!!! I am looking forward to more
>> continued success with the system, more integrations and to be able to
>> become more active in the community.
>>
>> /*******************************************
>>   Joe Stein
>>   Founder, Principal Consultant
>>   Big Data Open Source Security LLC
>>   http://www.stealth.ly
>>   Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>> ********************************************/
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message