accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ShawnWalker <...@git.apache.org>
Subject [GitHub] accumulo pull request #121: ACCUMULO-4353: Stabilize tablet assignment durin...
Date Thu, 23 Jun 2016 20:59:44 GMT
GitHub user ShawnWalker opened a pull request:

    https://github.com/apache/accumulo/pull/121

    ACCUMULO-4353: Stabilize tablet assignment during transient failure.

    Added configuration property `table.suspend.duration` (default 0s): When a tablet server
dies, instead of immediately placing its tablets in the `TabletState.UNASSIGNED` state, they
are instead moved to the new `TabletState.SUSPENDED` state.  A suspended tablet will only
be reassigned if (a) `table.suspend.duration` has passed since the tablet was suspended, or
(b) the tablet server most recently hosting the tablet has come back online.  In the latter
case, the tablet will be assigned back to its previous host.
    
    Added configuration property `master.metadata.suspendable` (default false): The above
functionality is really meant to be used only on user tablets.  Suspending metadata tablets
can lead to much more significant loss of availability.  Despite this, it is possible to set
`table.suspend.duration` on `accumulo.metadata`.  If one really wishes to allow metadata tablets
to be suspended as well, one must also set the `master.metadata.suspendable` to true.
    
    I chose not to implement suspension of the root tablet.
    
    Implementation outline:
    * `master.MasterTime` maintains an approximately monotonic clock; this is used by suspension
to determine how much time has passed since a tablet was suspended.  `MasterTime` periodically
writes its time base to ZooKeeper for persistence.
    * The `server.master.state.TabletState` enum now has a `TabletState.SUSPENDED` state.
 `TabletLocationState`, `MetaDataStateStore` were updated to properly read and write suspensions.
    * `server.master.state.TabletStateStore` now features a `suspend(...)` method, for suspending
a tablet, with implementation in `MetaDataStateStore`.  `suspend(...)` acts just as `unassign(...)`,
except that it writes additional metadata indicating when each tablet was suspended, and which
tablet server it was suspended from.
    * `master.TabletServerWatcher` updated to properly transition to/from `TabletState.SUSPENDED`.
    * `master.Master` updated to avoid balancing while any tablets remain suspended.
    * Minor changes to various miniaccumulo classes to facilitate testing tablet suspension.
 Particularly, the abilities to (a) restart a server with a configuration different from the
rest, and (b) restart fewer than the full complement of tablet servers.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ShawnWalker/accumulo ACCUMULO-4353

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/121.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #121
    
----
commit 8d097c5f4388be83e90325e5b6d674ad21a6121b
Author: Shawn Walker <accumulo@shawn-walker.net>
Date:   2016-06-21T17:34:34Z

    ACCUMULO-4353: Stabilize tablet assignment during transient failure.
    
    Added configuration property `table.suspend.duration` (default 0s): When a tablet server
dies, instead of immediately placing its tablets in the TabletState.UNASSIGNED state, they
are instead moved to the new TabletState.SUSPENDED state.  A suspended tablet will only be
reassigned if (a) table.suspend.duration has passed since the tablet was suspended, or (b)
the tablet server most recently hosting the tablet has come back online.  In the latter case,
the tablet will be assigned back to its previous host.
    
    Added configuration property `master.metadata.suspendable` (default false): The above
functionality is really meant to be used only on user tablets.  Suspending metadata tablets
can lead to much more significant loss of availability.  Despite this, it is possible to set
`table.suspend.duration` on `accumulo.metadata`.  If one really wishes to allow metadata tablets
to be suspended as well, one must also set the `master.metadata.suspendable` to true.
    
    I chose not to implement suspension of the root tablet.
    
    Implementation outline:
    * `master.MasterTime` maintains an approximately monotonic clock; this is used by suspension
to determine how much time has passed since a tablet was suspended.  `MasterTime` periodically
writes its time base to ZooKeeper for persistence.
    * The `server.master.state.TabletState` now has a `TabletState.SUSPENDED` state.  `TabletLocationState`,
`MetaDataStateStore` were updated to properly read and write suspensions.
    * `server.master.state.TabletStateStore` now features a `suspend(...)` method, for suspending
a tablet, with implementations in `MetaDataStateStore`.  `suspend(...)` acts just as `unassign(...)`,
except that it writes additional metadata indicating when each tablet was suspended, and which
tablet server it was suspended from.
    * `master.TabletServerWatcher` updated to properly transition to/from `TabletState.SUSPENDED`.
    * `master.Master` updated to avoid balancing while any tablets remain suspended.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message