Jeremy, >> * This session is able to successfully create ephemeral znodes, but watches never fire for the session. For example, when the session has a children watch set on /election/a, and then creates /election/a/a_00000001, its watch on /election/a never fires (but it does fire for sessions coming from other clients). Have you tried running wchc on your zookeeper servers to see what paths the session ids are watching ? Thanks, Neha On Mon, Mar 5, 2012 at 2:47 PM, Jeremy Stribling wrote: > I have been investigating an issue at one of our customers (our product > embeds Zookeeper in it), and here's a summary of what I've been able to > discern: > > * One of the servers in a 3-server ZK cluster (ZK 3.3.3 with some patches, C > client) experiences a hardware/firmware failure of its RAID partition > (possibly affecting the network card as well).  From /var/log/syslog: > > ------------------------------- > Feb 24 01:53:54 controller3 kernel: [881855.148384] megaraid_sas > 0000:03:00.0: vpd r/w failed.  This is likely a firmware bug on this device. >  Contact the card vendor for a firmware update. > Feb 24 01:53:54 controller3 kernel: [1398904.161148] bnx2 0000:01:00.0: irq > 73 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.161157] bnx2 0000:01:00.0: irq > 74 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.161165] bnx2 0000:01:00.0: irq > 75 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.161173] bnx2 0000:01:00.0: irq > 76 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.161181] bnx2 0000:01:00.0: irq > 77 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.161188] bnx2 0000:01:00.0: irq > 78 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.161196] bnx2 0000:01:00.0: irq > 79 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.161203] bnx2 0000:01:00.0: irq > 80 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.161210] bnx2 0000:01:00.0: irq > 81 for MSI/MSI-X > Feb 24 01:53:54 controller3 kernel: [1398904.241931] bnx2 0000:01:00.0: > eth0: using MSIX > Feb 24 01:53:54 controller3 kernel: [1398904.243302] ADDRCONF(NETDEV_UP): > eth0: link is not ready > Feb 24 01:53:57 controller3 kernel: [1398907.309739] bnx2 0000:01:00.0: > eth0: NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow > control ON > Feb 24 01:53:57 controller3 kernel: [1398907.311239] > ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > Feb 24 01:55:06 controller3 ntpdate[10764]: step time server 72.3.128.241 > offset 63.134559 sec > Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: name = > controller/disk-sda/disk_octets; value time = 1330048506; last cache update > = 1330048506; > Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in > target `write': Dispatching value to all write plugins failed with status > -1. > Feb 24 01:55:06 controller3 collectd[1691]: uc_update: Value too old: name = > controller/disk-sda/disk_ops; value time = 1330048506; last cache update = > 1330048506; > Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in > target `write': Dispatching value to all write plugins failed with status > -1. > ... > Feb 24 01:55:06 controller3 collectd[1691]: Filter subsystem: Built-in > target `write': Dispatching value to all write plugins failed with status > -1. > Feb 24 01:55:10 controller3 kernel: [1398917.517473] eth0: no IPv6 routers > present > Feb 24 01:55:56 controller3 kernel: [1398962.889915] bnx2 0000:01:00.0: irq > 73 for MSI/MSI-X > Feb 24 01:55:56 controller3 kernel: [1398962.889924] bnx2 0000:01:00.0: irq > 74 for MSI/MSI-X > Feb 24 01:55:56 controller3 kernel: [1398962.889932] bnx2 0000:01:00.0: irq > 75 for MSI/MSI-X > Feb 24 01:55:56 controller3 kernel: [1398962.889939] bnx2 0000:01:00.0: irq > 76 for MSI/MSI-X > Feb 24 01:55:56 controller3 kernel: [1398962.889946] bnx2 0000:01:00.0: irq > 77 for MSI/MSI-X > Feb 24 01:55:56 controller3 kernel: [1398962.889953] bnx2 0000:01:00.0: irq > 78 for MSI/MSI-X > Feb 24 01:55:56 controller3 kernel: [1398962.889960] bnx2 0000:01:00.0: irq > 79 for MSI/MSI-X > Feb 24 01:55:56 controller3 kernel: [1398962.889968] bnx2 0000:01:00.0: irq > 80 for MSI/MSI-X > Feb 24 01:55:56 controller3 kernel: [1398962.889975] bnx2 0000:01:00.0: irq > 81 for MSI/MSI-X > Feb 24 01:55:56 controller3 kernel: [1398962.970592] bnx2 0000:01:00.0: > eth0: using MSIX > Feb 24 01:55:56 controller3 kernel: [1398962.971959] ADDRCONF(NETDEV_UP): > eth0: link is not ready > Feb 24 01:55:59 controller3 kernel: [1398966.028629] bnx2 0000:01:00.0: > eth0: NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow > control ON > Feb 24 01:55:59 controller3 kernel: [1398966.030132] > ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > Feb 24 01:56:05 controller3 ntpdate[10968]: step time server 72.3.128.241 > offset 0.002667 sec > ------------------------------- > > * This leads existing ZK client sessions to expire, and those clients > attempt to make new sessions. > * One client (on a different node) establishes a new session to this server > at 01:55:56 (coinciding exactly with the second set of IRQ messages shown > above): > > ------------------------------- > 2012-02-24 01:55:56,964:17928(0x7fbf0b077700):ZOO_INFO@check_events@1632: > session establishment complete on server [10.14.237.70:2888], > sessionId=0xd235ad12815a0000, negotiated timeout=6000 > ------------------------------- > > * This session is able to successfully create ephemeral znodes, but watches > never fire for the session.  For example, when the session has a children > watch set on /election/a, and then creates /election/a/a_00000001, its watch > on /election/a never fires (but it does fire for sessions coming from other > clients). > > Unfortunately I cannot reproduce this, and we don't have good logs.  Because > of this, I cannot definitively place the blame on Zookeeper -- there are > layers of our product above ZK that could theoretically be the problem.  But > given the evidence I've looked at, ZK seems to be the likely culprit. > > So my question is: is anyone aware of a failure scenario that would cause > watches to not be fired, but allow other operations from the same session to > be successful?  And are there any workarounds?  I don't expect much help on > this from the community, given the lack of logs and evidence, but I just > wanted to throw it out there and see if anyone had any ideas off the top of > their head.  Thanks, > > Jeremy >