accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2313) Accumulo Tablet Server failed to retain lock with ZooKeeper
Date Mon, 03 Feb 2014 20:39:07 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889891#comment-13889891
] 

Josh Elser commented on ACCUMULO-2313:
--------------------------------------

Can you post said logs? The content of those GC log messages can be a very big hint as to
what was going on.

Something else has to be happening otherwise you wouldn't be losing the zk lock.

> Accumulo Tablet Server failed to retain lock with ZooKeeper
> -----------------------------------------------------------
>
>                 Key: ACCUMULO-2313
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2313
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.5.0
>         Environment: 40 Node Cluster
> Each Node: 64GB RAM, 8 Cores (2.4 GHz) , 4x1.5TB drives, 10 Gb/s Ethernet
>            Reporter: Glenn Primmer
>
> On 3 Nodes the Accumulo Tservers did not communicate with ZooKeeper within the timeout
period and therefor lost their locks.  Looking at the resource utilization (Nagios) it did
not appear that the node CPU/resource utilization was a factor as to why Accumulo Tservers
did not communicate with ZooKeeper within the timeout period.
> Question is, is there potential thread contention for the thread responsible for retaining
the ZooKeeper lock in the Accumulo Tservers?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message