accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2645) tablet stuck unloading
Date Wed, 09 Apr 2014 15:50:19 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964298#comment-13964298
] 

Josh Elser commented on ACCUMULO-2645:
--------------------------------------

bq. the monitor could display the number of unload requests outstanding in the tserver

That would be cool. I could see the general premise being otherwise useful too.

Perhaps related, does tablet unload interrupt running scans? Or, does a scan have the ability
to block unloads indefinitely? Perhaps the tserver should try for some amount of time to unload,
if it still hasn't unloaded because a scan is running, forcefully abort it? That also begs
the question in the case of custom iterators, can we make something that will gracefully abort
a scan using such iterators or are we reliant on users implementing exception handling properly
to avoid the "9 day query"?

> tablet stuck unloading
> ----------------------
>
>                 Key: ACCUMULO-2645
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2645
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.4.4
>         Environment: very large production cluster, CDH3u5
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>              Labels: newbie
>             Fix For: 1.7.0
>
>
>  * master failed to balance
>  * custom balancer refused to balance while migrations were in place
>  * tablet server was not unloading the tablet
>  * tablet server was otherwise serving tablets, providing status
>  * memory dump determined that there were 21K UnloadTabletHandler objects
>  * jstack showed UnloadTabletHandler in Tablet.completeClose, line 2674
>  * the last print of the debug "completeClose(safeState=true, completeClose=true) occured
9 days ago
>  * there was a query that had been running for 9 days



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message