incubator-accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <adam.p.fu...@ugov.gov>
Subject Re: checkpointing
Date Sat, 31 Dec 2011 15:14:28 GMT
I believe what you're looking for is what we've been calling table cloning,
which is new to Accumulo 1.4:
http://incubator.apache.org/accumulo/user_manual_1.4-incubating/Table_Configuration.html#Cloning_Tables

Adam


On Fri, Dec 30, 2011 at 4:59 PM, Aaron Cordova <aaron@cordovas.org> wrote:

> Did a checkpoint feature ever get added?
>
> If not, would it still be possible to do so, perhaps by taking the table
> to be checkpointed offline, or compacting it, or whatever, then copy the
> relevant parts of the metadata table to another table. Then, for the
> rollback / restore processes, simply copy the metadata back into the
> !METADATA table?
>
> Of course, the garbage collector would have to know not to garbage collect
> files from the checkpoint.
>
> It would probably be easier to implement by marking entries in the
> METADATA table as part of a checkpoint, which could also be unmarked to
> 'delete' the checkpoint.
>
> This feature would be very useful in building aggregate tables, when it's
> possible that some new additions may get messed up. Particularly, during
> map reduce jobs that are writing to an aggregated accumulo table,
> speculative execution, and retried tasks that wrote some results can result
> in double counting / aggregation of some entries. It'd be very nice if one
> could checkpoint an aggregated table before starting such a task, in case
> failures corrupt the counts.
>
> Thoughts?
>
> Aaron

Mime
View raw message