accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Denis Petrov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3471) Adding a new tserver puts some tables offline for few minutes
Date Wed, 14 Jan 2015 05:50:35 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276518#comment-14276518
] 

Denis Petrov commented on ACCUMULO-3471:
----------------------------------------

Anyway, some throttling of the balancing would be useful.
All those 200 unloaded tablets (in the big cluster there would be 6000) are supposed to be
loaded by a single tserver.
With my performance bug, the latest of the unloaded tablets was loaded again after 33 seconds
of being offline (and, well, minutes in the big cluster).
But it should make some stress on the tserver in any setup.

====== master.log ====== 
2015-01-14 05:29:03,585 [master.Master] INFO : New servers: [c3:9997[24ad5756645001d]]
2015-01-14 05:29:03,586 [master.EventCoordinator] INFO : There are now 11 tablet servers
2015-01-14 05:29:04,037 [master.EventCoordinator] INFO : Migrating 200 more tablets, 200 total
2015-01-14 05:29:04,360 [master.EventCoordinator] INFO : [Normal Tablets]: 200 tablets unloaded
======



> Adding a new tserver puts some tables offline for few minutes
> -------------------------------------------------------------
>
>                 Key: ACCUMULO-3471
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3471
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.1
>         Environment: Ubuntu 12.04
>            Reporter: Denis Petrov
>             Fix For: 1.6.2, 1.7.0
>
>         Attachments: ACCUMULO-3471-balance-test.patch
>
>
> I run an Accumulo cluster with 15 tservers with about 6000 tablets on each (disks are
quite slow - each node has 2*4Tb SATA)
> When a new tserver added to the cluster, the rebalancing procedure starts.
> During this procedure some tablets are offline and unreachable during 5-10 minutes.
> It is visible in http://monitor:50095/tables and by timeouts on client side.
> The rebalancing caused by killing a tserver converges much faster then rebalancing caused
by adding a tserver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message