accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2313) Accumulo Tablet Server failed to retain lock with ZooKeeper
Date Mon, 03 Feb 2014 20:17:07 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889863#comment-13889863
] 

Josh Elser commented on ACCUMULO-2313:
--------------------------------------

I'm pretty sure that zookeeper has its own thread for keeping that lock, but I don't remember
where exactly that is anymore. That might be internal to zk itself.

What did the tserver logs look like the 5-10 minutes before it ultimately killed itself? Was
there any load on the tservers when they died?

> Accumulo Tablet Server failed to retain lock with ZooKeeper
> -----------------------------------------------------------
>
>                 Key: ACCUMULO-2313
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2313
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.5.0
>         Environment: 40 Node Cluster
> Each Node: 64GB RAM, 8 Cores (2.4 GHz) , 4x1.5TB drives, 10 Gb/s Ethernet
>            Reporter: Glenn Primmer
>
> On 3 Nodes the Accumulo Tservers did not communicate with ZooKeeper within the timeout
period and therefor lost their locks.  Looking at the resource utilization (Nagios) it did
not appear that the node CPU/resource utilization was a factor as to why Accumulo Tservers
did not communicate with ZooKeeper within the timeout period.
> Question is, is there potential thread contention for the thread responsible for retaining
the ZooKeeper lock in the Accumulo Tservers?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message