incubator-accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Cordova <aa...@cordovas.org>
Subject checkpointing
Date Fri, 30 Dec 2011 21:59:24 GMT
Did a checkpoint feature ever get added? 

If not, would it still be possible to do so, perhaps by taking the table to be checkpointed
offline, or compacting it, or whatever, then copy the relevant parts of the metadata table
to another table. Then, for the rollback / restore processes, simply copy the metadata back
into the !METADATA table? 

Of course, the garbage collector would have to know not to garbage collect files from the
checkpoint.

It would probably be easier to implement by marking entries in the METADATA table as part
of a checkpoint, which could also be unmarked to 'delete' the checkpoint.

This feature would be very useful in building aggregate tables, when it's possible that some
new additions may get messed up. Particularly, during map reduce jobs that are writing to
an aggregated accumulo table, speculative execution, and retried tasks that wrote some results
can result in double counting / aggregation of some entries. It'd be very nice if one could
checkpoint an aggregated table before starting such a task, in case failures corrupt the counts.

Thoughts?

Aaron
Mime
View raw message