accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: Backup and Recovery
Date Tue, 03 Oct 2017 20:40:42 GMT
Hi Mike. This is a great question. Accumulo has several options for backup.

Accumulo is backed by HDFS for persisting its data on disk. It may be
possible to use S3 directly at this layer. I'm not sure what the current
state is for doing something like this, but a brief Googling for "HDFS on
S3" shows a few historical projects which may still be active and mature.

Accumulo also has a replication feature to automatically mirror live ingest
to a pluggable external receiver, which could be a backup service you've
written to store data in S3. Recovery would depend on how you store the
data in S3. You could also implement an ingest system which stores data to
a backup as well as to Accumulo, to handle both live and bulk ingest.

Accumulo also has an "exporttable" feature, which exports the metadata for
a table, along with a list of files in HDFS for you to back up to S3 (or
another file system). Recovery involves using the "importtable" feature
which recreates the metadata, and bulk importing the files after you've
moved them from your backup location back onto HDFS.

This is just a rough outline of 3 possible solutions. I don't know which
(if any) would match your requirements best. There may be many other
solutions as well.

On Tue, Oct 3, 2017 at 4:10 PM <mikewestman@zapatatechnology.com> wrote:

> Please forgive the newbie question. What options are there for backup and
> recovery of accumulo data?
>
>
>
> Ideally I would like something that would replicate to S3 in realtime.
>
>

Mime
View raw message