zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Stribling <st...@nicira.com>
Subject watchers not fired after a disk failure?
Date Mon, 05 Mar 2012 22:47:38 GMT
I have been investigating an issue at one of our customers (our product 
embeds Zookeeper in it), and here's a summary of what I've been able to 
discern:

* One of the servers in a 3-server ZK cluster (ZK 3.3.3 with some 
patches, C client) experiences a hardware/firmware failure of its RAID 
partition (possibly affecting the network card as well).  From 
/var/log/syslog:

-------------------------------
Feb 24 01:53:54 controller3 kernel: [881855.148384] megaraid_sas 
0000:03:00.0: vpd r/w failed.  This is likely a firmware bug on this 
device.  Contact the card vendor for a firmware update.
Feb 24 01:53:54 controller3 kernel: [1398904.161148] bnx2 0000:01:00.0: 
irq 73 for MSI/MSI-X
Feb 24 01:53:54 controller3 kernel: [1398904.161157] bnx2 0000:01:00.0: 
irq 74 for MSI/MSI-X
Feb 24 01:53:54 controller3 kernel: [1398904.161165] bnx2 0000:01:00.0: 
irq 75 for MSI/MSI-X
Feb 24 01:53:54 controller3 kernel: [1398904.161173] bnx2 0000:01:00.0: 
irq 76 for MSI/MSI-X
Feb 24 01:53:54 controller3 kernel: [1398904.161181] bnx2 0000:01:00.0: 
irq 77 for MSI/MSI-X
Feb 24 01:53:54 controller3 kernel: [1398904.161188] bnx2 0000:01:00.0: 
irq 78 for MSI/MSI-X
Feb 24 01:53:54 controller3 kernel: [1398904.161196] bnx2 0000:01:00.0: 
irq 79 for MSI/MSI-X
Feb 24 01:53:54 controller3 kernel: [1398904.161203] bnx2 0000:01:00.0: 
irq 80 for MSI/MSI-X
Feb 24 01:53:54 controller3 kernel: [1398904.161210] bnx2 0000:01:00.0: 
irq 81 for MSI/MSI-X
Feb 24 01:53:54 controller3 kernel: [1398904.241931] bnx2 0000:01:00.0: 
eth0: using MSIX
Feb 24 01:53:54 controller3 kernel: [1398904.243302] 
ADDRCONF(NETDEV_UP): eth0: link is not ready
Feb 24 01:53:57 controller3 kernel: [1398907.309739] bnx2 0000:01:00.0: 
eth0: NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit 
flow control ON
Feb 24 01:53:57 controller3 kernel: [1398907.311239] 
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Feb 24 01:55:06 controller3 ntpdate[10764]: step time server 
72.3.128.241 offset 63.134559 sec
Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: 
name = controller/disk-sda/disk_octets; value time = 1330048506; last 
cache update = 1330048506;
Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in 
target `write': Dispatching value to all write plugins failed with 
status -1.
Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: 
name = controller/disk-sda/disk_ops; value time = 1330048506; last cache 
update = 1330048506;
Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in 
target `write': Dispatching value to all write plugins failed with 
status -1.
...
Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in 
target `write': Dispatching value to all write plugins failed with 
status -1.
Feb 24 01:55:10 controller3 kernel: [1398917.517473] eth0: no IPv6 
routers present
Feb 24 01:55:56 controller3 kernel: [1398962.889915] bnx2 0000:01:00.0: 
irq 73 for MSI/MSI-X
Feb 24 01:55:56 controller3 kernel: [1398962.889924] bnx2 0000:01:00.0: 
irq 74 for MSI/MSI-X
Feb 24 01:55:56 controller3 kernel: [1398962.889932] bnx2 0000:01:00.0: 
irq 75 for MSI/MSI-X
Feb 24 01:55:56 controller3 kernel: [1398962.889939] bnx2 0000:01:00.0: 
irq 76 for MSI/MSI-X
Feb 24 01:55:56 controller3 kernel: [1398962.889946] bnx2 0000:01:00.0: 
irq 77 for MSI/MSI-X
Feb 24 01:55:56 controller3 kernel: [1398962.889953] bnx2 0000:01:00.0: 
irq 78 for MSI/MSI-X
Feb 24 01:55:56 controller3 kernel: [1398962.889960] bnx2 0000:01:00.0: 
irq 79 for MSI/MSI-X
Feb 24 01:55:56 controller3 kernel: [1398962.889968] bnx2 0000:01:00.0: 
irq 80 for MSI/MSI-X
Feb 24 01:55:56 controller3 kernel: [1398962.889975] bnx2 0000:01:00.0: 
irq 81 for MSI/MSI-X
Feb 24 01:55:56 controller3 kernel: [1398962.970592] bnx2 0000:01:00.0: 
eth0: using MSIX
Feb 24 01:55:56 controller3 kernel: [1398962.971959] 
ADDRCONF(NETDEV_UP): eth0: link is not ready
Feb 24 01:55:59 controller3 kernel: [1398966.028629] bnx2 0000:01:00.0: 
eth0: NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit 
flow control ON
Feb 24 01:55:59 controller3 kernel: [1398966.030132] 
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Feb 24 01:56:05 controller3 ntpdate[10968]: step time server 
72.3.128.241 offset 0.002667 sec
-------------------------------

* This leads existing ZK client sessions to expire, and those clients 
attempt to make new sessions.
* One client (on a different node) establishes a new session to this 
server at 01:55:56 (coinciding exactly with the second set of IRQ 
messages shown above):

-------------------------------
2012-02-24 
01:55:56,964:17928(0x7fbf0b077700):ZOO_INFO@check_events@1632: session 
establishment complete on server [10.14.237.70:2888], 
sessionId=0xd235ad12815a0000, negotiated timeout=6000
-------------------------------

* This session is able to successfully create ephemeral znodes, but 
watches never fire for the session.  For example, when the session has a 
children watch set on /election/a, and then creates 
/election/a/a_00000001, its watch on /election/a never fires (but it 
does fire for sessions coming from other clients).

Unfortunately I cannot reproduce this, and we don't have good logs.  
Because of this, I cannot definitively place the blame on Zookeeper -- 
there are layers of our product above ZK that could theoretically be the 
problem.  But given the evidence I've looked at, ZK seems to be the 
likely culprit.

So my question is: is anyone aware of a failure scenario that would 
cause watches to not be fired, but allow other operations from the same 
session to be successful?  And are there any workarounds?  I don't 
expect much help on this from the community, given the lack of logs and 
evidence, but I just wanted to throw it out there and see if anyone had 
any ideas off the top of their head.  Thanks,

Jeremy


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message