accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-3090) VolumeChooser should be able to decide per-file
Date Fri, 29 Aug 2014 17:07:53 GMT


Josh Elser commented on ACCUMULO-3090:

I disagree with you. Replication is a way to implement backup -- recovery, really failover,
is left as an exercise to the user. A full recovery solution would likely include snapshot+distcp+import.

An area that you also haven't made clear is what you mean by "backup": you stated HDFS operations
in your earlier post which sounds more like you're talking about only backing up files (which
has a host of problems, most of which will stem from your requirement to have a clean shutdown
of accumulo before you can do anything). 

But, getting back to the original point of this, if a tablet is split across volumes (which
can be assumed to mean separate HDFS instances), Sean is right in that it would make recovery
of a complete tablet horrible. Personally, I think 'sticking' a tablet to a volume simplifies
a lot of logic (but full URIs in the metadata table do help quite a bit). Since a table is
made up from many tablets, it makes sense that we keep a tablet stored in a single location
(since the space available on a volume should greatly exceed the size of a tablet -- even
at the 10s or 100s of GBs). 

This needs much more explanation of a problem, a proposed solution to the problem, and pros/cons
to that solution.

> VolumeChooser should be able to decide per-file
> -----------------------------------------------
>                 Key: ACCUMULO-3090
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>    Affects Versions: 1.6.0
>            Reporter: Christopher Tubbs
> Currently, the VolumeChooser decides only once per-tablet which volume to use for that
tablet. The directory is "sticky" after the decision is made. This can cause unexpected behavior
for users, which makes it harder to manage volume usage/capacity.
> One unexpected behavior is that data will still be written to an existing tablet's predetermined
volume, even if the volume is removed from instance.volumes.
> Another unexpected behavior this causes for users is when adding a new volume. One might
expect to compact tablets after adding a new volume, and have the new usage to be balanced
across all the volumes (using the provided RandomVolumeChooser), but due to the stickiness,
that is not the behavior seen. Instead, only new tablets (from new tables, or new splits)
will begin to randomly use the new volume.
> If the sticky behavior is desired, a volume chooser could still do that, by accepting
a "preferredTabletVolume"/"preferredTabletDirectory" in its environment, provided by the caller,
to use to make decisions.

This message was sent by Atlassian JIRA

View raw message