hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: permanent ZSESSIONMOVED
Date Tue, 16 Mar 2010 19:05:20 GMT
We'll probably need the ZK server/client logs to hunt this down. Can you 
tell if the MOVED happens in some particular scenario, say you are 
connected to a follower and move to a leader, or perhaps you are 
connected to server A, get disconnected and reconnected to server A? 
.... is there some pattern that could help us understand what's causing 
this?

 > Session gets broken after zookeeper library reconnects to cluster,
 > although most of reconnection leave session if working state.

I'm not sure I follow this ^^^, could you restate it? (I get the first 
about about the session is broken after reconnect, what's the second 
part saying?)

Regards,

Patrick

Łukasz Osipiuk wrote:
> On Tue, Mar 16, 2010 at 19:22, Patrick Hunt <phunt@apache.org> wrote:
>> Łukasz Osipiuk wrote:
>>> On Tue, Mar 16, 2010 at 17:18, Patrick Hunt <phunt@apache.org> wrote:
>>>> Can you verify that you are using 3.2.2 on all servers? You can do this
>>>> by
>>>> running the "stat" command against each of your servers and look at the
>>>> very
>>>> top of the output (we include the version of zk server there).
>>>> http://bit.ly/dglVld
>>> Thanks for hint.
>>>
>>> We had 3.2.2 jars installed on all server - I checked it previously
>>> but apparently our script which restarts instance
>>> after installing .deb does not work very well and node2 and node3 were
>>> running version 3.2.1.
>>> Could using 3.2.1 be source of the problem?
>>>
>> I heard this story where customer service for dell asks customers to replug
>> in the mouse because it might be "a dusty connection". Invariably when ppl
>> look at the back of the box they find that the mouse is not actually plugged
>> in. Brilliant ideas from CS perspective. ;-)
>>
>> I think that issue was fixed in 3.2.1, so I don't think that's an issue
>> (version).
> 
> Damn :(
> 
>>>> Are you using synchronous or async operations in your client?
>>> We are only using async operations.
>> Ok. good info.
>>
>>>> I see snippets from the server logs, but I don't see anything from you
>>>> client logs. It would really help if you could zip up all the logs
>>>> (server
>>>> and client) and provide them.
>>> We do not really have much logging on application side :(. Failed
>>> operations are logged with error code (ZSESSIONEXPIRED in our case).
>>> We also log session state changes but we noticed only changes between 1
>>> and 3.
>>> Unfortunately we do not have logs coming directly from zookeeper
>>> library as it logs to stderr/stdout and we do not store any of them at
>>> the moment.
>>>
>> I see, in this case I think that the log messages theyselves are the
>> problem, ie we are warning about this, but it's not a real problem. We warn
>> on things that we don't like but can handle, we error if we
>> don'tlike/can'thandle.
>>
>> MOVED basically means that a client initiated a request on one server, then
>> moved to another server before the request could be processed.
>>
>> In your case you are using async requests, that can actually make this
>> problem show up more (more "in flight" requests vs sync operation).
>>
>> So if you are not seeing MOVED on the client then this is fine. If you see
>> them in the server log it means what I said above.
>>
>> That make sense, or am I missing something?
> 
> It seems I was what I wrote was not clear. We are noticing errors on
> client side. Any operation after
> session gets into broken state fails with ZSESSIONMOVE error code (I
> wrote ZSESSIONEXPIRED by mistake in previous post).
> We get ZSESSIONMOVE from callback on rc parameter.
> 
> Session gets broken after zookeeper library reconnects to cluster,
> although most of reconnection leave session if working state.
> 
> Regards, Łukasz
> 
> 
>> Patrick
>>
>>
>>
>>> Regards, Łukasz
>>>
>>>> Patrick
>>>>
>>>> Łukasz Osipiuk wrote:
>>>>> not really - it happens occasionally - every few days :(
>>>>> I believe it is somewhat connected with our network environment which
>>>>> suffers from some packet loss which leads to
>>>>> connection timeouts.
>>>>>
>>>>> I can switch on some more logging if you can lead me which categories
>>>>> are worth to enable DEBUG for them.
>>>>>
>>>>> Regards, Łukasz Osipiuk
>>>>>
>>>>> On Tue, Mar 16, 2010 at 16:35, Benjamin Reed <breed@yahoo-inc.com>
>>>>> wrote:
>>>>>> weird, this does sound like a bug. do you have a reliable way of
>>>>>> reproducing
>>>>>> the problem?
>>>>>>
>>>>>> thanx
>>>>>> ben
>>>>>>
>>>>>> On 03/16/2010 08:27 AM, Łukasz Osipiuk wrote:
>>>>>>> nope.
>>>>>>>
>>>>>>> I always pass 0 as clientid.
>>>>>>>
>>>>>>> Łukasz
>>>>>>>
>>>>>>> On Tue, Mar 16, 2010 at 16:20, Benjamin Reed<breed@yahoo-inc.com>
>>>>>>>  wrote:
>>>>>>>
>>>>>>>> do you ever use zookeeper_init() with the clientid field
set to
>>>>>>>> something
>>>>>>>> other than null?
>>>>>>>>
>>>>>>>> ben
>>>>>>>>
>>>>>>>> On 03/16/2010 07:43 AM, Łukasz Osipiuk wrote:
>>>>>>>>
>>>>>>>>> Hi everyone!
>>>>>>>>>
>>>>>>>>> I am writing to this group because recently we are getting
some
>>>>>>>>> strange errors with our production zookeeper setup.
>>>>>>>>>
>>>>>>>>>  From time to time we are observing that our client application
(C++
>>>>>>>>> based) disconnects from zookeeper (session state is changed
to 1)
>>>>>>>>> and
>>>>>>>>> reconnects (state changed to 3).
>>>>>>>>> This itself is not a problem - usually application continues
to run
>>>>>>>>> without problems after reconnect.
>>>>>>>>> But from time to time after above happens all subsequent
operations
>>>>>>>>> start to return ZSESSIONMOVED error. To make it work
again we have
>>>>>>>>> to
>>>>>>>>> restart application (which creates new zookeeper session).
>>>>>>>>>
>>>>>>>>> I noticed that in 3.2.0 introduced a bug
>>>>>>>>> http://issues.apache.org/jira/browse/ZOOKEEPER-449 but
we are using
>>>>>>>>> zookeeper v. 3.2.2.
>>>>>>>>> I just noticed that app at compile time used 3.2.0 library
but
>>>>>>>>> patches
>>>>>>>>> fixing bug 449 did not touch C client lib so I believe
that our
>>>>>>>>> problems are not
>>>>>>>>> related with that.
>>>>>>>>>
>>>>>>>>> In zookeeper logs at moment which initiated the problem
with client
>>>>>>>>> application I have
>>>>>>>>>
>>>>>>>>> node1:
>>>>>>>>> 2010-03-16 14:21:43,510 - INFO
>>>>>>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@607] - Connected
to
>>>>>>>>> /10.1.112.61:37197 lastZxid 42992576502
>>>>>>>>> 2010-03-16 14:21:43,510 - INFO
>>>>>>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@636] - Renewing
session
>>>>>>>>> 0x324dcc1ba580085
>>>>>>>>> 2010-03-16 14:21:49,443 - INFO
>>>>>>>>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@992]
- Finished init
>>>>>>>>> of 0x324dcc1ba580085 valid:true
>>>>>>>>> 2010-03-16 14:21:49,443 - WARN
>>>>>>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@518] - Exception
causing
>>>>>>>>> close of session 0x324dcc1ba580085 due to java.io.IOException:
Read
>>>>>>>>> error
>>>>>>>>> 2010-03-16 14:21:49,444 - INFO
>>>>>>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@857] - closing
>>>>>>>>> session:0x324dcc1ba580085 NIOServerCnxn:
>>>>>>>>> java.nio.channels.SocketChannel[connected local=/10.1.112.62:2181
>>>>>>>>> remote=/10.1.112.61:37197]
>>>>>>>>>
>>>>>>>>> node2:
>>>>>>>>> 2010-03-16 14:21:40,580 - WARN
>>>>>>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@494] - Exception
causing
>>>>>>>>> close of session 0x324dcc1ba580085 due to java.io.IOException:
Read
>>>>>>>>> error
>>>>>>>>> 2010-03-16 14:21:40,581 - INFO
>>>>>>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@833] - closing
>>>>>>>>> session:0x324dcc1ba580085 NIOServerCnxn:
>>>>>>>>> java.nio.channels.SocketChannel[connected local=/10.1.112.63:2181
>>>>>>>>> remote=/10.1.112.61:60693]
>>>>>>>>> 2010-03-16 14:21:46,839 - INFO
>>>>>>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@583] - Connected
to
>>>>>>>>> /10.1.112.61:48336 lastZxid 42992576502
>>>>>>>>> 2010-03-16 14:21:46,839 - INFO
>>>>>>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@612] - Renewing
session
>>>>>>>>> 0x324dcc1ba580085
>>>>>>>>> 2010-03-16 14:21:49,439 - INFO
>>>>>>>>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@964]
- Finished init
>>>>>>>>> of 0x324dcc1ba580085 valid:true
>>>>>>>>>
>>>>>>>>> node3:
>>>>>>>>> 2010-03-16 02:14:48,961 - WARN
>>>>>>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@494] - Exception
causing
>>>>>>>>> close of session 0x324dcc1ba580085 due to java.io.IOException:
Read
>>>>>>>>> error
>>>>>>>>> 2010-03-16 02:14:48,962 - INFO
>>>>>>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@833] - closing
>>>>>>>>> session:0x324dcc1ba580085 NIOServerCnxn:
>>>>>>>>> java.nio.channels.SocketChannel[connected local=/10.1.112.64:2181
>>>>>>>>> remote=/10.1.112.61:57309]
>>>>>>>>>
>>>>>>>>> and then lots of entries like this
>>>>>>>>> 2010-03-16 02:14:54,696 - WARN
>>>>>>>>> [ProcessThread:-1:PrepRequestProcessor@402] - Got exception
when
>>>>>>>>> processing sessionid:0x324dcc1ba580085 type:create cxid:0x4b9e9e49
>>>>>>>>> zxid:0xfffffffffffffffe txntype:unknown
>>>>>>>>> /locks/9871253/lock-8589943989-
>>>>>>>>> org.apache.zookeeper.KeeperException$SessionMovedException:
>>>>>>>>> KeeperErrorCode = Session moved
>>>>>>>>>       at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.zookeeper.server.SessionTrackerImpl.checkSession(SessionTrackerImpl.java:231)
>>>>>>>>>       at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:211)
>>>>>>>>>       at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114)
>>>>>>>>> 2010-03-16 14:22:06,428 - WARN
>>>>>>>>> [ProcessThread:-1:PrepRequestProcessor@402] - Got exception
when
>>>>>>>>> processing sessionid:0x324dcc1ba580085 type:create cxid:0x4b9f6603
>>>>>>>>> zxid:0xfffffffffffffffe txntype:unknown
>>>>>>>>> /locks/1665960/lock-8589961006-
>>>>>>>>> org.apache.zookeeper.KeeperException$SessionMovedException:
>>>>>>>>> KeeperErrorCode = Session moved
>>>>>>>>>       at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.zookeeper.server.SessionTrackerImpl.checkSession(SessionTrackerImpl.java:231)
>>>>>>>>>       at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:211)
>>>>>>>>>       at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> To workaround disconnections I am going to increase session
timeout
>>>>>>>>> from 5 to 15 seconds but event if it helps at all it
is just a
>>>>>>>>> workaround.
>>>>>>>>>
>>>>>>>>> Do you have an idea where is the source of my problem.
>>>>>>>>>
>>>>>>>>> Regards, Łukasz Osipiuk
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>
>>>
> 
> 
> 

Mime
View raw message