accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (ACCUMULO-777) isLockHeld needs better bullet-proofing against transient errors
Date Thu, 11 Oct 2012 15:49:02 GMT

     [ https://issues.apache.org/jira/browse/ACCUMULO-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eric Newton resolved ACCUMULO-777.
----------------------------------

    Resolution: Fixed

fixed in r1397117 r1397120.
                
> isLockHeld needs better bullet-proofing against transient errors
> ----------------------------------------------------------------
>
>                 Key: ACCUMULO-777
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-777
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.4.1, 1.3.6, 1.4.0, 1.3.5
>         Environment: medium sized cluster
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>             Fix For: 1.4.2
>
>
> During the minor compaction, the zookeeper lock for the tablet server is double-checked
prior to updating the METADATA table information.  In one unlucky moment, the zookeeper connection
was lost during this check.  The tablet server failed the check, but the lock was not lost.
 As a result, the root tablet remained hosted for another 4 weeks, but did not flush mutations
to disk.  When memory filled, the operator noticed a long hold time and killed the server.
 This caused a log recovery of 98 1G of logs, some of which were very old.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message