hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Errors during shutdown/startup of ZooKeeper
Date Wed, 03 Jun 2009 18:39:37 GMT
Nitay, any luck? Feel free to create a JIRA to track this. If you point 
to the test code that's experiencing the problem we'll try and take a look.

Patrick

Patrick Hunt wrote:
> This log manifests if the client is running ahead of the server.
> 
> say you have:
> 1) client connects to server A and sees some changes
> 2) client gets disconnected from A and attempts to connect to B
> 3) B can be running behind A by some number of changes (it will 
> eventually catch up)
> 4) client will attempt to connect to another server that's at, or ahead 
> of it's zxid until successful.
> 
> why? this ensures that the client never sees old data, part of the 
> guarantee you are provided when using zk. However since servers in a 
> quorum can run behind (minority) then you might see this.
> 
> It's unusual to see this so many times however. I see that you are 
> running this as part of a junit test. Perhaps that has some impact? Are 
> you shutting down servers, perhaps clearing the datadir and restarting 
> them, w/o closing all of the clients? If your tests are not running in 
> "fork mode" for junit (or multiple tests w/in a junit test class) then 
> old clients can hang around _if not explicitly closed_ and try to 
> re-connect to new servers that you are using for new tests - if the 
> servers are starting fresh (zxid=1) then you can see this alot as the 
> old (zombie) clients cannot connect to the new servers. Perhaps this is 
> what you are seeing?
> 
> Patrick
> 
> Nitay wrote:
>> I see. That helps. However, even as warnings, these go on seemingly
>> endlessly. Why do they not get fixed by themselves? What are we doing 
>> wrong
>> here?
>>
>> On Tue, Jun 2, 2009 at 2:24 PM, Mahadev Konar <mahadev@yahoo-inc.com> 
>> wrote:
>>
>>> Hi Nitay,
>>>  This is not an error but should be a warning. I have opened up a 
>>> jira for
>>> it.
>>>
>>> http://issues.apache.org/jira/browse/ZOOKEEPER-428
>>>
>>>
>>> The message just says that a client is connecting to a server that is
>>> behind
>>> that a server is was connected to earlier. The log should be warn and 
>>> not
>>> error and should be fixed in the next release.
>>>
>>> mahadev
>>>
>>> On 6/2/09 2:12 PM, "Nitay" <nitayj@gmail.com> wrote:
>>>
>>>> Hey guys,
>>>>
>>>> We are getting a lot of messages like this in HBase:
>>>>
>>>> [junit] 2009-06-02 11:57:23,658 ERROR [NIOServerCxn.Factory:21810]
>>>> server.NIOServerCnxn(514): Client has seen zxid 0xe our last zxid is 
>>>> 0xd
>>>>
>>>> For more context, the block it usually appears in is:
>>>>
>>>>     [junit] 2009-06-02 13:27:54,083 INFO  [main-SendThread]
>>>> zookeeper.ClientCnxn$SendThread(737): Priming connection to
>>>> java.nio.channels.SocketChannel[connected 
>>>> local=/0:0:0:0:0:0:0:1%0:56511
>>>> remote=localhost/0:0:0:0:0:0:0:1:21810]
>>>>     [junit] 2009-06-02 13:27:54,084 INFO  [main-SendThread]
>>>> zookeeper.ClientCnxn$SendThread(889): Server connection successful
>>>>     [junit] 2009-06-02 13:27:54,093 INFO  [NIOServerCxn.Factory:21810]
>>>> server.NIOServerCnxn(532): Connected to /0:0:0:0:0:0:0:1%0:56511 
>>>> lastZxid
>>> 16
>>>>     [junit] 2009-06-02 13:27:54,094 ERROR [NIOServerCxn.Factory:21810]
>>>> server.NIOServerCnxn(543): Client has seen zxid 0x10 our last zxid 
>>>> is 0x4
>>>>     [junit] 2009-06-02 13:27:54,094 WARN  [NIOServerCxn.Factory:21810]
>>>> server.NIOServerCnxn(444): Exception causing close of session 0x0 
>>>> due to
>>>> java.io.IOException: Client has seen zxid 0x10 our last zxid is 0x4
>>>>     [junit] 2009-06-02 13:27:54,094 DEBUG 
>>>> [NIOServerCxn.Facto777ry:21810]
>>>> server.NIOServerCnxn(447): IOException stack trace
>>>>     [junit] java.io.IOException: Client has seen zxid 0x10 our last 
>>>> zxid
>>> is
>>>> 0x4
>>>>     [junit]     at
>>>>
>>> org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.jav

>>>
>>>> a:544)
>>>>     [junit]     at
>>>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:331)
>>>>     [junit]     at
>>>>
>>> org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:176)

>>>
>>>>     [junit] 2009-06-02 13:27:54,094 INFO  [NIOServerCxn.Factory:21810]
>>>> server.NIOServerCnxn(777): closing session:0x0 NIOServerCnxn:
>>>> java.nio.channels.SocketChannel[connected 
>>>> local=/0:0:0:0:0:0:0:1%0:21810
>>>> remote=/0:0:0:0:0:0:0:1%0:56511]
>>>>     [junit] 2009-06-02 13:27:54,097 WARN  [main-SendThread]
>>>> zookeeper.ClientCnxn$SendThread(919): Exception closing session
>>>> 0x121a2a7c43a0002 to sun.nio.ch.SelectionKeyImpl@2c662b4e
>>>>     [junit] java.io.IOException: Read error rc = -1
>>>> java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
>>>>     [junit]     at
>>>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653)
>>>>     [junit]     at
>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
>>>>     [junit] 2009-06-02 13:27:54,097 WARN  [main-SendThread]
>>>> zookeeper.ClientCnxn$SendThread(953): Ignoring exception during 
>>>> shutdown
>>>> input
>>>>     [junit] java.net.SocketException: Socket is not connected
>>>>     [junit]     at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
>>>>     [junit]     at
>>>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
>>>>     [junit]     at
>>>> sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>>>>     [junit]     at
>>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:951)
>>>>     [junit]     at
>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922)
>>>>
>>>>
>>>> This happens in a seemingly endless loop. We are not quite sure what it
>>>> means. Can someone help shed some light on these messages?
>>>>
>>>> Thanks,
>>>> -n
>>>
>>

Mime
View raw message