accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ShawnWalker <...@git.apache.org>
Subject [GitHub] accumulo issue #121: ACCUMULO-4353: Stabilize tablet assignment during trans...
Date Tue, 05 Jul 2016 14:51:27 GMT
Github user ShawnWalker commented on the issue:

    https://github.com/apache/accumulo/pull/121
  
    > The stop-here.sh command has the master unload the tablets I think. How will this
patch handle that case?
    This patch won't handle such a case at all.  I'm sure it shows my inexperience with Accumulo,
but I was unaware of this script.  I'm more familiar with engineering and dealing with [crash-only
software](https://www.usenix.org/legacy/events/hotos03/tech/full_papers/candea/candea.pdf).
 I had assumed that a tserver would be stopped by SIGTERM or SIGKILL.
    
    I'm open to suggestions on how to handle this use case.  My current thought would be to
make unloading a tablet this way suspend the tablet instead of unassigning it.  I.e. in `tserver.TabletServer.UnloadTabletHandler.run()`
at line 2012, call `TabletStateStore.suspend(...)` instead of `TabletStateStore.unassign(...)`.
    
    > When a tablet server is suspended, all queries will block right?
    When a *tablet* is suspended, all queries against that tablet do seem to block (or possibly
time out).
    
    > I see you are suspending the metadata tablets too.
    By default, metadata tablets won't be suspended, even if the metadata table (or global
configuration) has `tablet.suspend.duration` set.  One must also set the option `master.metadata.suspendable`
to true (default false). The check for this is handled at Master.java:1154. 
    
    Note to self: Looking back at that code, I realize that this check is made only once (at
startup), instead of rechecking for updated configuration.  Should probably make that check
repeatedly.
    
    > I see you are storing the host and port in the metadata for a suspended tablet. Sometimes
we have tservers come up with a different host or port. In that case, I guess the tablets
will wait until the suspend duration to be reassigned.
    This is correct.  Tablet suspension is essentially incompatible with dynamic port assignment.
 Of course, this wouldn't be the only part of Accumulo to suffer under random/dynamic port
assignment.  Specifying `tserv.port.client==0` or `tserv.port.search==true` breaks assumptions
in other places too.  Some I know of:
    * I decided to match host+port based on code in `server.master.balance.DefaultLoadBalancer.getAssignment()`.
 That code uses host+port to match a tablet's `last` column, for preserving locality.  If
the tserver's port changes, the `last` column is effectively ignored, reducing locality.
    * Having walked the logic path for `stop-here.sh`, my read is that `server.util.Admin.stopTabletServer(...)`
(used by stop-here.sh) assumes tserver(s) on the specified  host (resp. localhost) will be
on port(s) specified by `tserv.port.client`.  Hence, running a tserver with `tserv.port.client`==0
will render `stop-here.sh` ineffective.  Similarly, running a tserver with `tserv.port.search==true`
risks rendering `stop-here.sh` ineffective.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message