accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2645) tablet stuck unloading
Date Wed, 09 Apr 2014 16:29:14 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964344#comment-13964344
] 

Josh Elser commented on ACCUMULO-2645:
--------------------------------------

bq. I am writing a test for this theory.

Neat!

bq.  Current theory is that this interrupt is being caught by the HDFS library, which indirectly
causes the request to the NN to hang forever. 

Yeah, this is what I was getting at. I wonder if there is something we could design into SKVI
or the interruption call to ensure interruption actually propagates to the scan actually sees
it and takes action. Just a thought.

> tablet stuck unloading
> ----------------------
>
>                 Key: ACCUMULO-2645
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2645
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.4.4
>         Environment: very large production cluster, CDH3u5
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>              Labels: newbie
>             Fix For: 1.7.0
>
>
>  * master failed to balance
>  * custom balancer refused to balance while migrations were in place
>  * tablet server was not unloading the tablet
>  * tablet server was otherwise serving tablets, providing status
>  * memory dump determined that there were 21K UnloadTabletHandler objects
>  * jstack showed UnloadTabletHandler in Tablet.completeClose, line 2674
>  * the last print of the debug "completeClose(safeState=true, completeClose=true) occured
9 days ago
>  * there was a query that had been running for 9 days



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message