zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neha Narkhede <neha.narkh...@gmail.com>
Subject Re: ephemeral node not deleted after client session closed
Date Thu, 10 Nov 2011 22:01:08 GMT
To answer the remaining client-side questions -

>>  Do you have multiple threads sharing a session?
No

>> Q: what are you clients doing? It's weird that a create would come
from the client after the session has been closed.

Here are the client side logs. It seems like the client session kept timing
out.

2011/11/08 11:46:13.423 INFO [ClientCnxn]
[main-SendThread(app27.stg:12913)] [kafka] Client session timed out, have
not heard from server in 4000ms for sessionid 0x43220b966ee3930, closing
socket connection and attempting reconnect
2011/11/08 11:46:13.523 INFO [ZkClient] [main-EventThread] [kafka]
zookeeper state changed (Disconnected)2011/11/08 11:46:13.562 INFO
[ClientCnxn] [main-SendThread(app27.stg:12913)] [kafka] Opening socket
connection to server esv4-app30.stg/172.18.98.90:129
132011/11/08 11:46:13.563 INFO [ClientCnxn]
[main-SendThread(app30.stg:12913)] [kafka] Socket connection established to
esv4-app30.stg/172.18.98.90:12913,
 initiating session2011/11/08 11:46:14.173 INFO [ClientCnxn]
[main-SendThread(app30.stg:12913)] [kafka] Unable to reconnect to ZooKeeper
service, session 0x43220b966ee3930
 has expired, closing socket connection2011/11/08 11:46:14.195 INFO
[ZkClient] [main-EventThread] [kafka] zookeeper state changed (Expired)
2011/11/08 11:46:14.195 INFO [ZooKeeper] [main-EventThread] [kafka]
Initiating client connection,
connectString=app27.stg:12913,esv4-app28.stg:12913,esv4-app29.stg:12913,esv4-app30.stg:12913/kafka-tracking
sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@679a9d19
2011/11/08 11:46:14.263 INFO [ClientCnxn] [main-SendThread()] [kafka]
Opening socket connection to server
app28.stg/172.18.98.101:129132011/11/0811:46:14.263 INFO
[KafkaZooKeeper]
[ZkClient-EventThread-495-esv4-app27.stg:12913,esv4-app28.stg:12913,esv4-app29.stg:12913,esv4-app30.stg:12913/k
afka-tracking] [kafka] re-registering broker info in ZK for broker
5282011/11/08 11:46:14.263 INFO [ClientCnxn] [main-EventThread] [kafka]
EventThread shut down
2011/11/08 11:46:14.263 INFO [KafkaZooKeeper]
[ZkClient-EventThread-495-app27.stg:12913,esv4-app28.stg:12913,esv4-app29.stg:12913,esv4-app30.stg:12913/kafka-tracking]
[kafka] Registering broker /brokers/ids/528
2011/11/08 11:46:14.264 INFO [ClientCnxn]
[main-SendThread(app28.stg:12913)] [kafka] Socket connection established to
esv4-app28.stg/172.18.98.101:12913, initiating session
2011/11/08 11:46:15.764 INFO [ClientCnxn]
[main-SendThread(app28.stg:12913)] [kafka] Client session timed out, have
not heard from server in 1500ms for sessionid 0x0, closing socket
connection and attempting reconnect
2011/11/08 11:46:16.448 INFO [ClientCnxn]
[main-SendThread(app28.stg:12913)] [kafka] Opening socket connection to
server esv4-app29.stg/172.18.98.89:12913
2011/11/08 11:46:16.448 INFO [ClientCnxn]
[main-SendThread(app29.stg:12913)] [kafka] Socket connection established to
esv4-app29.stg/172.18.98.89:12913, initiating session
2011/11/08 11:46:17.948 INFO [ClientCnxn]
[main-SendThread(app29.stg:12913)] [kafka] Client session timed out, have
not heard from server in 1500ms for sessionid 0x0, closing socket
connection and attempting reconnect
2011/11/08 11:46:18.168 INFO [ClientCnxn]
[main-SendThread(app29.stg:12913)] [kafka] Opening socket connection to
server esv4-app30.stg/172.18.98.90:12913
2011/11/08 11:46:18.169 INFO [ClientCnxn]
[main-SendThread(app30.stg:12913)] [kafka] Socket connection established to
esv4-app30.stg/172.18.98.90:12913, initiating session
2011/11/08 11:46:19.669 INFO [ClientCnxn]
[main-SendThread(app30.stg:12913)] [kafka] Client session timed out, have
not heard from server in 1500ms for sessionid 0x0, closing socket
connection and attempting reconnect
2011/11/08 11:46:20.042 INFO [ClientCnxn]
[main-SendThread(app30.stg:12913)] [kafka] Opening socket connection to
server esv4-app27.stg/172.18.98.88:12913
2011/11/08 11:46:20.042 INFO [ClientCnxn]
[main-SendThread(app27.stg:12913)] [kafka] Socket connection established to
esv4-app27.stg/172.18.98.88:12913, initiating session
2011/11/08 11:46:21.543 INFO [ClientCnxn]
[main-SendThread(app27.stg:12913)] [kafka] Client session timed out, have
not heard from server in 1501ms for sessionid 0x0, closing socket
connection and attempting reconnect
2011/11/08 11:46:21.841 INFO [ClientCnxn]
[main-SendThread(app27.stg:12913)] [kafka] Opening socket connection to
server esv4-app28.stg/172.18.98.101:12913
2011/11/08 11:46:21.842 INFO [ClientCnxn]
[main-SendThread(app28.stg:12913)] [kafka] Socket connection established to
esv4-app28.stg/172.18.98.101:12913, initiating session
2011/11/08 11:46:23.343 INFO [ClientCnxn]
[main-SendThread(app28.stg:12913)] [kafka] Client session timed out, have
not heard from server in 1501ms for sessionid 0x0, closing socket
connection and attempting reconnect
2011/11/08 11:46:23.984 INFO [ClientCnxn]
[main-SendThread(app28.stg:12913)] [kafka] Opening socket connection to
server esv4-app29.stg/172.18.98.89:12913
2011/11/08 11:46:23.985 INFO [ClientCnxn]
[main-SendThread(app29.stg:12913)] [kafka] Socket connection established to
esv4-app29.stg/172.18.98.89:12913, initiating session
2011/11/08 11:46:25.486 INFO [ClientCnxn]
[main-SendThread(app29.stg:12913)] [kafka] Client session timed out, have
not heard from server in 1501ms for sessionid 0x0, closing socket
connection and attempting reconnect
2011/11/08 11:46:25.637 INFO [ClientCnxn]
[main-SendThread(app29.stg:12913)] [kafka] Opening socket connection to
server esv4-app30.stg/172.18.98.90:12913
2011/11/08 11:46:25.637 INFO [ClientCnxn]
[main-SendThread(app30.stg:12913)] [kafka] Socket connection established to
esv4-app30.stg/172.18.98.90:12913, initiating session
2011/11/08 11:46:26.065 INFO [ClientCnxn]
[main-SendThread(app30.stg:12913)] [kafka] Session establishment complete
on server esv4-app30.stg/172.18.98.90:12913, sessionid = 0x43220b966ee394a,
negotiated timeout = 6000
2011/11/08 11:46:26.066 INFO [ZkClient] [main-EventThread] [kafka]
zookeeper state changed (SyncConnected)

I will attach our conversation to the bug,.

Thanks,
Neha

On Thu, Nov 10, 2011 at 1:52 PM, Neha Narkhede <neha.narkhede@gmail.com>wrote:

> Thanks for the quick responses, guys! Please find my replies inline -
>
>
> >> 1) Why is the session closed, the client closed it or the cluster
> expired it?
> Cluster expired it.
>
>
> >> 2) which server was the session attached to - the first (44sec max
> lat) or one of the others? Which server was the leader?
> We didn't run srvr, so no idea which server was the leader. I still have
> the zookeeper log4j and txn logs, as well as the client logs. Where do I
> look to find this out ?
>
>
> >> 3) the znode exists on all 4 servers, is that right?
> Yes
>
>
> >> 5) why are your max latencies, as well as avg latency, so high?
> >> a) are these dedicated boxes, not virtualized, correct?
> these are dedicated boxes, but zk is currently co-located with kafka, but
> on different disks
>
>
> >> b) is the jvm going into gc pause? (try turning on verbose logging, or
> use "jstat" with the gc options to see the history if you still have
> those jvms running)
> I don't believe we had gc logs on these machines. So its unclear.
>
>
> >> d) do you have dedicated spindles for the ZK WAL? If not another
> process might be causing the fsyncs to pause. (you can use iostat or
> strace to monitor this)
> No. The log4j and zk txn logs share the same disks.
>
>
> >> Is that the log from the server that's got the 44sec max latency?
> Yes.
>
> >> This is 3.3.3 ?
> Yes.
>
>
> >> was there any instability in the quorum itself during this time
> period?
> How do I find that out ?
>
> Thanks,
> Neha
>
>
> On Thu, Nov 10, 2011 at 1:34 PM, Camille Fournier <camille@apache.org>wrote:
>
>> This is zk 3.3.3?
>>
>> From my phone
>> On Nov 10, 2011 4:02 PM, "Patrick Hunt" <phunt@apache.org> wrote:
>>
>>> On Thu, Nov 10, 2011 at 12:20 PM, Patrick Hunt <phunt@apache.org> wrote:
>>> > Q: what are you clients doing? It's weird that a create would come
>>> > from the client after the session has been closed. Do you have
>>> > multiple threads sharing a session?
>>>
>>> The client (checked java) seems to protect against this. Was the
>>> session expired? IC. Based on the cxid of the close session being 0
>>> I'm guessing it's an expiration.
>>>
>>> Patrick
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message