accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ACCUMULO-3471) Adding a new tserver puts some tables offline for few minutes
Date Tue, 13 Jan 2015 20:07:35 GMT

     [ https://issues.apache.org/jira/browse/ACCUMULO-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Keith Turner updated ACCUMULO-3471:
-----------------------------------
    Attachment: ACCUMULO-3471-balance-test.patch

The balance test patch is a modification of an existing unit test.  I wrote it just to check
that the balancer itself is not doing anything strange under this situation.   

The test assigns 600 tablets to 15 tservers, adds one tservers, and balances.  375 tablets
are scheduled to be moved to the new tserver, then then its stable.

I tried running it against 1.6.1 and latest 1.6.2-SNAP and saw no problems w/ the default
balancer.  

[~denistmp] are you using hflush or hsync?  (the default is hflush).  I am wondering if hflush
is causing metadata table updates related to migrating tablets to take a long time.


> Adding a new tserver puts some tables offline for few minutes
> -------------------------------------------------------------
>
>                 Key: ACCUMULO-3471
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3471
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.1
>         Environment: Ubuntu 12.04
>            Reporter: Denis Petrov
>             Fix For: 1.6.2, 1.7.0
>
>         Attachments: ACCUMULO-3471-balance-test.patch
>
>
> I run an Accumulo cluster with 15 tservers with about 6000 tablets on each (disks are
quite slow - each node has 2*4Tb SATA)
> When a new tserver added to the cluster, the rebalancing procedure starts.
> During this procedure some tablets are offline and unreachable during 5-10 minutes.
> It is visible in http://monitor:50095/tables and by timeouts on client side.
> The rebalancing caused by killing a tserver converges much faster then rebalancing caused
by adding a tserver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message