accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3471) Adding a new tserver puts some tables offline for few minutes
Date Wed, 14 Jan 2015 15:15:35 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277030#comment-14277030
] 

Josh Elser commented on ACCUMULO-3471:
--------------------------------------

Thinking about this some more...

To get batches of assignments, it might be more straightforward to have the master perform
the binning of tablets that don't require log recovery together. The tabletserver is still
going to have sequential processing of incoming assignments to avoid problems with resource
usage by recovery. If the tabletserver receives a collection of extents to load (instead of
just a single extent), it would be easy to bring all of the tablets online cleanly. 

Another option would be to rework the recovery code so that recoveries could be handled in
parallel (in the eyes of the caller -- they might actually be executed serially behind the
scene). The tablet server could build up a collection of extents to load and process them
itself. This sounds more difficult to me than the first suggestion.

> Adding a new tserver puts some tables offline for few minutes
> -------------------------------------------------------------
>
>                 Key: ACCUMULO-3471
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3471
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.1
>         Environment: Ubuntu 12.04
>            Reporter: Denis Petrov
>             Fix For: 1.6.2, 1.7.0
>
>         Attachments: ACCUMULO-3471-balance-test.patch
>
>
> I run an Accumulo cluster with 15 tservers with about 6000 tablets on each (disks are
quite slow - each node has 2*4Tb SATA)
> When a new tserver added to the cluster, the rebalancing procedure starts.
> During this procedure some tablets are offline and unreachable during 5-10 minutes.
> It is visible in http://monitor:50095/tables and by timeouts on client side.
> The rebalancing caused by killing a tserver converges much faster then rebalancing caused
by adding a tserver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message