hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Reed <br...@yahoo-inc.com>
Subject Re: Getting a "node exists" code on a sequence create
Date Wed, 03 Nov 2010 18:37:23 GMT
yes, i think you have summarized the problem nicely jeremy.

i'm curious about your reasoning for running servers in standalone mode 
and then merging. can you explain that a bit more?

thanx
ben

On 11/01/2010 04:51 PM, Jeremy Stribling wrote:
> I think this is caused by stupid behavior on our application's part, and
> the error message just confused me.  Here's what I think is happening.
>
> 1) 3 servers are up and accepting data, creating sequential znodes under
> /zkrsm.
> 2) 1 server dies, the other 2 continue creating sequential znodes.
> 3) The 1st server restarts, but instead of joining the other 2 servers,
> it starts an instance by itself, knowing only about the znodes created
> before it died.  [This is a bug in our application -- it is supposed to
> join the other 2 servers in their cluster.]
> 4) Another server (#2) dies and restarts, joining the cluster of server
> #1.  It knows about more sequential znodes under /zkrsm than server #1.
> 5) At this point, trying to create a new znode in the #1-#2 cluster
> might be problematic, because servers #1 and #2 know about different
> sets of znode.  If #1 allocates what it thinks is a new sequential
> number for a new znode, it could be one already used by server #2, and
> hence a "node exists" code might be returned.
>
> So, in summary, our application is almost certainly using Zookeeper
> wrong.  Sorry to waste time on the list, but maybe this thread can help
> someone in the future.
>
> (If this explanation sounds totally off-base though, let me know.  I'm
> not 100% certain this is what's happening, but it definitely seems likely.)
>
> Thanks,
>
> Jeremy
>
> On 11/01/2010 02:56 PM, Jeremy Stribling wrote:
>> Yes, every znode in /zkrsm was created with the sequence flag.
>>
>> We bring up a cluster of three nodes, though we do it in a slightly
>> odd manner to support dynamism: each node starts up as a single-node
>> instance knowing only itself, and then each node is contacted by a
>> coordinator that kills the ZooKeeperServer object and starts a new
>> QuorumPeer object using the full list of three servers.  I know this
>> is weird; perhaps this has something to do with it.
>>
>> Other than the weird setup behavior, we are just writing a few
>> sequential records into the system (which all seems to work fine),
>> killing one of the nodes (one that has been elected leader via the
>> standard recommended ZK leader election algorithm), restarting it, and
>> then trying to create more sequential znodes.  I'm guessing this is
>> pretty well-tested behavior, so there must be something weird or wrong
>> about the way I have stuff setup.
>>
>> I'm happy to provide whatever logs or snapshots might help someone
>> track this down.  Thanks,
>>
>> Jeremy
>>
>> On 11/01/2010 02:42 PM, Benjamin Reed wrote:
>>> how were you able to reproduce it?
>>>
>>> all the znodes in /zkrsm were created with the sequence flag. right?
>>>
>>> ben
>>>
>>> On 11/01/2010 02:28 PM, Jeremy Stribling wrote:
>>>> We were able to reproduce it.  A "stat" on all three servers looks
>>>> identical:
>>>>
>>>> [zk:<ip:port>(CONNECTED) 0] stat /zkrsm
>>>> cZxid = 9
>>>> ctime = Mon Nov 01 13:01:57 PDT 2010
>>>> mZxid = 9
>>>> mtime = Mon Nov 01 13:01:57 PDT 2010
>>>> pZxid = 12884902218
>>>> cversion = 177
>>>> dataVersion = 0
>>>> aclVersion = 0
>>>> ephemeralOwner = 0
>>>> dataLength = 0
>>>> numChildren = 177
>>>>
>>>> Creating a sequential node through the command line also fails:
>>>>
>>>> [zk:<ip:port>(CONNECTED) 1] create -s /zkrsm/0000000000000000_record
>>>> testdata
>>>> Node already exists: /zkrsm/0000000000000000_record
>>>>
>>>> One potentially interesting thing is that numChildren above is 177,
>>>> though I have sequence numbers on that record prefix up to 214 or so.
>>>> There seem to be some gaps though -- I thin "ls /zkrsm" only shows
>>>> about
>>>> 177.  Not sure if that's relevant or not.
>>>>
>>>> Thanks,
>>>>
>>>> Jeremy
>>>>
>>>> On 11/01/2010 12:06 PM, Jeremy Stribling wrote:
>>>>> Thanks for the reply.  It happened every time we called create, not
>>>>> just once.  More than that, we tried restarting each of the nodes in
>>>>> the system (one-by-one), including the new master, and the problem
>>>>> continued.
>>>>>
>>>>> Unfortunately we cleaned everything up, and it's not in that state
>>>>> anymore.  We haven't yet tried to reproduce, but I will try and report
>>>>> back if I can get any "cversion" info.
>>>>>
>>>>> Jeremy
>>>>>
>>>>> On 11/01/2010 11:33 AM, Patrick Hunt wrote:
>>>>>> Hi Jeremy, this sounds like a bug to me, I don't think you should
be
>>>>>> getting the nodeexists when the sequence flag is set.
>>>>>>
>>>>>> Looking at the code briefly we use the parent's "cversion"
>>>>>> (incremented each time the child list is changed, added/removed).
>>>>>>
>>>>>> Did you see this error each time you called create, or just once?
If
>>>>>> you look at the cversion in the Stat of the znode "/zkrsm" on each
of
>>>>>> the servers what does it show? You can use the java CLI to connect
to
>>>>>> each of your servers and access this information. It would be
>>>>>> interesting to see if the data was out of sync only for a short
>>>>>> period
>>>>>> of time, or forever. Is this repeatable?
>>>>>>
>>>>>> Ben/Flavio do you see anything here?
>>>>>>
>>>>>> Patrick
>>>>>>
>>>>>> On Thu, Oct 28, 2010 at 6:06 PM, Jeremy Stribling<strib@nicira.com>
>>>>>> wrote:
>>>>>>> HI everyone,
>>>>>>>
>>>>>>> Is there any situation in which creating a new ZK node with the
>>>>>>> SEQUENCE
>>>>>>> flag should result in a "node exists" error?  I'm seeing this
>>>>>>> happening
>>>>>>> after a failure of a ZK node that appeared to have been the master;
>>>>>>> when the
>>>>>>> new master takes over, my app is unable to create a new SEQUENCE
>>>>>>> node under
>>>>>>> an existing parent node.  I'm using Zookeeper 3.2.2.
>>>>>>>
>>>>>>> Here's a representative log snippet:
>>>>>>>
>>>>>>> --------------------------
>>>>>>> 3050756 [ProcessThread:-1] TRACE
>>>>>>> org.apache.zookeeper.server.PrepRequestProcessor  -
>>>>>>> :Psessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
>>>>>>> zxid:0xfffffffffffffffe txntype:unknown
>>>>>>> /zkrsm/0000000000000000_record
>>>>>>> 3050756 [ProcessThread:-1] WARN
>>>>>>> org.apache.zookeeper.server.PrepRequestProcessor  - Got exception
>>>>>>> when
>>>>>>> processing sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
>>>>>>> zxid:0xfffffffffffffffe txntype:unknown n/a
>>>>>>> org.apache.zookeeper.KeeperException$NodeExistsException:
>>>>>>> KeeperErrorCode =
>>>>>>> NodeExists
>>>>>>>           at
>>>>>>> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:245)
>>>>>>>
>>>>>>>
>>>>>>>           at
>>>>>>> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114)
>>>>>>>
>>>>>>>
>>>>>>> 3050756 [ProcessThread:-1] DEBUG
>>>>>>> org.apache.zookeeper.server.quorum.CommitProcessor  - Processing
>>>>>>> request::
>>>>>>> sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
>>>>>>> zxid:0x50000027e
>>>>>>> txntype:-1 n/a
>>>>>>> 3050756 [ProcessThread:-1] DEBUG
>>>>>>> org.apache.zookeeper.server.quorum.Leader
>>>>>>>     - Proposing:: sessionid:0x12bf518350f0001 type:create
>>>>>>> cxid:0x4cca0691
>>>>>>> zxid:0x50000027e txntype:-1 n/a
>>>>>>> 3050756 [SyncThread:0] TRACE
>>>>>>> org.apache.zookeeper.server.quorum.Leader  -
>>>>>>> Ack zxid: 0x50000027e
>>>>>>> 3050757 [SyncThread:0] TRACE
>>>>>>> org.apache.zookeeper.server.quorum.Leader  -
>>>>>>> outstanding proposal: 0x50000027e
>>>>>>> 3050757 [SyncThread:0] TRACE
>>>>>>> org.apache.zookeeper.server.quorum.Leader  -
>>>>>>> outstanding proposals all
>>>>>>> 3050757 [SyncThread:0] DEBUG
>>>>>>> org.apache.zookeeper.server.quorum.Leader  -
>>>>>>> Count for zxid: 0x50000027e is 1
>>>>>>> 3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
>>>>>>> org.apache.zookeeper.server.quorum.Leader  - Ack zxid: 0x50000027e
>>>>>>> 3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
>>>>>>> org.apache.zookeeper.server.quorum.Leader  - outstanding proposal:
>>>>>>> 0x50000027e
>>>>>>> 3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
>>>>>>> org.apache.zookeeper.server.quorum.Leader  - outstanding
>>>>>>> proposals all
>>>>>>> 3050757 [FollowerHandler-/172.16.0.28:48776] DEBUG
>>>>>>> org.apache.zookeeper.server.quorum.Leader  - Count for zxid:
>>>>>>> 0x50000027e is
>>>>>>> 2
>>>>>>> 3050757 [FollowerHandler-/172.16.0.28:48776] DEBUG
>>>>>>> org.apache.zookeeper.server.quorum.CommitProcessor  - Committing
>>>>>>> request::
>>>>>>> sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
>>>>>>> zxid:0x50000027e
>>>>>>> txntype:-1 n/a
>>>>>>> 3050757 [CommitProcessor:0] DEBUG
>>>>>>> org.apache.zookeeper.server.FinalRequestProcessor  - Processing
>>>>>>> request::
>>>>>>> sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
>>>>>>> zxid:0x50000027e
>>>>>>> txntype:-1 n/a
>>>>>>> 3050757 [CommitProcessor:0] TRACE
>>>>>>> org.apache.zookeeper.server.FinalRequestProcessor  -
>>>>>>> :Esessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
>>>>>>> zxid:0x50000027e
>>>>>>> txntype:-1 n/a
>>>>>>> 3050757 [FollowerHandler-/172.16.0.28:41062] TRACE
>>>>>>> org.apache.zookeeper.server.quorum.Leader  - Ack zxid: 0x50000027e
>>>>>>> 3050757 [FollowerHandler-/172.16.0.28:41062] TRACE
>>>>>>> org.apache.zookeeper.server.quorum.Leader  - outstanding
>>>>>>> proposals all
>>>>>>> 3050757 [FollowerHandler-/172.16.0.28:41062] DEBUG
>>>>>>> org.apache.zookeeper.server.quorum.Leader  - outstanding is 0
>>>>>>> --------------------------
>>>>>>>
>>>>>>> I'm still a n00b at understanding ZK log messages, so maybe there's
>>>>>>> something obvious going on.  I looked in the JIRA and did my
best to
>>>>>>> search
>>>>>>> the mailing list archives, but couldn't find anything related
to
>>>>>>> this.  Any
>>>>>>> ideas?  Thanks very much,
>>>>>>>
>>>>>>> Jeremy
>>>>>>>


Mime
View raw message