hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Stribling <st...@nicira.com>
Subject Re: Getting a "node exists" code on a sequence create
Date Wed, 03 Nov 2010 19:23:37 GMT
We're building a product for users that aren't used to deploying 
distributed systems, and we're trying to make it as easy to configure 
and use as possible.  That means not requiring the full list of IP 
addresses for every node at configuration time; instead, each node can 
be configured with a single IP of an existing node and the cluster can 
grow dynamically as new nodes come up.  Furthermore, if a new node needs 
to be added to the cluster, it can be done without changing the 
configurations of the existing nodes.

As you might be able to guess, we're really hoping there's progress in 
the future on a version of Zookeeper that can properly support dynamic 
joins, but for now this approach seems to work ok for us.

Jeremy

On 11/03/2010 11:37 AM, Benjamin Reed wrote:
> yes, i think you have summarized the problem nicely jeremy.
>
> i'm curious about your reasoning for running servers in standalone 
> mode and then merging. can you explain that a bit more?
>
> thanx
> ben
>
> On 11/01/2010 04:51 PM, Jeremy Stribling wrote:
>> I think this is caused by stupid behavior on our application's part, and
>> the error message just confused me.  Here's what I think is happening.
>>
>> 1) 3 servers are up and accepting data, creating sequential znodes under
>> /zkrsm.
>> 2) 1 server dies, the other 2 continue creating sequential znodes.
>> 3) The 1st server restarts, but instead of joining the other 2 servers,
>> it starts an instance by itself, knowing only about the znodes created
>> before it died.  [This is a bug in our application -- it is supposed to
>> join the other 2 servers in their cluster.]
>> 4) Another server (#2) dies and restarts, joining the cluster of server
>> #1.  It knows about more sequential znodes under /zkrsm than server #1.
>> 5) At this point, trying to create a new znode in the #1-#2 cluster
>> might be problematic, because servers #1 and #2 know about different
>> sets of znode.  If #1 allocates what it thinks is a new sequential
>> number for a new znode, it could be one already used by server #2, and
>> hence a "node exists" code might be returned.
>>
>> So, in summary, our application is almost certainly using Zookeeper
>> wrong.  Sorry to waste time on the list, but maybe this thread can help
>> someone in the future.
>>
>> (If this explanation sounds totally off-base though, let me know.  I'm
>> not 100% certain this is what's happening, but it definitely seems 
>> likely.)
>>
>> Thanks,
>>
>> Jeremy
>>
>> On 11/01/2010 02:56 PM, Jeremy Stribling wrote:
>>> Yes, every znode in /zkrsm was created with the sequence flag.
>>>
>>> We bring up a cluster of three nodes, though we do it in a slightly
>>> odd manner to support dynamism: each node starts up as a single-node
>>> instance knowing only itself, and then each node is contacted by a
>>> coordinator that kills the ZooKeeperServer object and starts a new
>>> QuorumPeer object using the full list of three servers.  I know this
>>> is weird; perhaps this has something to do with it.
>>>
>>> Other than the weird setup behavior, we are just writing a few
>>> sequential records into the system (which all seems to work fine),
>>> killing one of the nodes (one that has been elected leader via the
>>> standard recommended ZK leader election algorithm), restarting it, and
>>> then trying to create more sequential znodes.  I'm guessing this is
>>> pretty well-tested behavior, so there must be something weird or wrong
>>> about the way I have stuff setup.
>>>
>>> I'm happy to provide whatever logs or snapshots might help someone
>>> track this down.  Thanks,
>>>
>>> Jeremy
>>>
>>> On 11/01/2010 02:42 PM, Benjamin Reed wrote:
>>>> how were you able to reproduce it?
>>>>
>>>> all the znodes in /zkrsm were created with the sequence flag. right?
>>>>
>>>> ben
>>>>
>>>> On 11/01/2010 02:28 PM, Jeremy Stribling wrote:
>>>>> We were able to reproduce it.  A "stat" on all three servers looks
>>>>> identical:
>>>>>
>>>>> [zk:<ip:port>(CONNECTED) 0] stat /zkrsm
>>>>> cZxid = 9
>>>>> ctime = Mon Nov 01 13:01:57 PDT 2010
>>>>> mZxid = 9
>>>>> mtime = Mon Nov 01 13:01:57 PDT 2010
>>>>> pZxid = 12884902218
>>>>> cversion = 177
>>>>> dataVersion = 0
>>>>> aclVersion = 0
>>>>> ephemeralOwner = 0
>>>>> dataLength = 0
>>>>> numChildren = 177
>>>>>
>>>>> Creating a sequential node through the command line also fails:
>>>>>
>>>>> [zk:<ip:port>(CONNECTED) 1] create -s /zkrsm/0000000000000000_record
>>>>> testdata
>>>>> Node already exists: /zkrsm/0000000000000000_record
>>>>>
>>>>> One potentially interesting thing is that numChildren above is 177,
>>>>> though I have sequence numbers on that record prefix up to 214 or so.
>>>>> There seem to be some gaps though -- I thin "ls /zkrsm" only shows
>>>>> about
>>>>> 177.  Not sure if that's relevant or not.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jeremy
>>>>>
>>>>> On 11/01/2010 12:06 PM, Jeremy Stribling wrote:
>>>>>> Thanks for the reply.  It happened every time we called create, not
>>>>>> just once.  More than that, we tried restarting each of the nodes
in
>>>>>> the system (one-by-one), including the new master, and the problem
>>>>>> continued.
>>>>>>
>>>>>> Unfortunately we cleaned everything up, and it's not in that state
>>>>>> anymore.  We haven't yet tried to reproduce, but I will try and 
>>>>>> report
>>>>>> back if I can get any "cversion" info.
>>>>>>
>>>>>> Jeremy
>>>>>>
>>>>>> On 11/01/2010 11:33 AM, Patrick Hunt wrote:
>>>>>>> Hi Jeremy, this sounds like a bug to me, I don't think you 
>>>>>>> should be
>>>>>>> getting the nodeexists when the sequence flag is set.
>>>>>>>
>>>>>>> Looking at the code briefly we use the parent's "cversion"
>>>>>>> (incremented each time the child list is changed, added/removed).
>>>>>>>
>>>>>>> Did you see this error each time you called create, or just 
>>>>>>> once? If
>>>>>>> you look at the cversion in the Stat of the znode "/zkrsm" on

>>>>>>> each of
>>>>>>> the servers what does it show? You can use the java CLI to 
>>>>>>> connect to
>>>>>>> each of your servers and access this information. It would be
>>>>>>> interesting to see if the data was out of sync only for a short
>>>>>>> period
>>>>>>> of time, or forever. Is this repeatable?
>>>>>>>
>>>>>>> Ben/Flavio do you see anything here?
>>>>>>>
>>>>>>> Patrick
>>>>>>>
>>>>>>> On Thu, Oct 28, 2010 at 6:06 PM, Jeremy Stribling<strib@nicira.com>
>>>>>>> wrote:
>>>>>>>> HI everyone,
>>>>>>>>
>>>>>>>> Is there any situation in which creating a new ZK node with
the
>>>>>>>> SEQUENCE
>>>>>>>> flag should result in a "node exists" error?  I'm seeing
this
>>>>>>>> happening
>>>>>>>> after a failure of a ZK node that appeared to have been the

>>>>>>>> master;
>>>>>>>> when the
>>>>>>>> new master takes over, my app is unable to create a new SEQUENCE
>>>>>>>> node under
>>>>>>>> an existing parent node.  I'm using Zookeeper 3.2.2.
>>>>>>>>
>>>>>>>> Here's a representative log snippet:
>>>>>>>>
>>>>>>>> --------------------------
>>>>>>>> 3050756 [ProcessThread:-1] TRACE
>>>>>>>> org.apache.zookeeper.server.PrepRequestProcessor  -
>>>>>>>> :Psessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
>>>>>>>> zxid:0xfffffffffffffffe txntype:unknown
>>>>>>>> /zkrsm/0000000000000000_record
>>>>>>>> 3050756 [ProcessThread:-1] WARN
>>>>>>>> org.apache.zookeeper.server.PrepRequestProcessor  - Got exception
>>>>>>>> when
>>>>>>>> processing sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
>>>>>>>> zxid:0xfffffffffffffffe txntype:unknown n/a
>>>>>>>> org.apache.zookeeper.KeeperException$NodeExistsException:
>>>>>>>> KeeperErrorCode =
>>>>>>>> NodeExists
>>>>>>>>           at
>>>>>>>> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:245)

>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           at
>>>>>>>> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114)

>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 3050756 [ProcessThread:-1] DEBUG
>>>>>>>> org.apache.zookeeper.server.quorum.CommitProcessor  - Processing
>>>>>>>> request::
>>>>>>>> sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
>>>>>>>> zxid:0x50000027e
>>>>>>>> txntype:-1 n/a
>>>>>>>> 3050756 [ProcessThread:-1] DEBUG
>>>>>>>> org.apache.zookeeper.server.quorum.Leader
>>>>>>>>     - Proposing:: sessionid:0x12bf518350f0001 type:create
>>>>>>>> cxid:0x4cca0691
>>>>>>>> zxid:0x50000027e txntype:-1 n/a
>>>>>>>> 3050756 [SyncThread:0] TRACE
>>>>>>>> org.apache.zookeeper.server.quorum.Leader  -
>>>>>>>> Ack zxid: 0x50000027e
>>>>>>>> 3050757 [SyncThread:0] TRACE
>>>>>>>> org.apache.zookeeper.server.quorum.Leader  -
>>>>>>>> outstanding proposal: 0x50000027e
>>>>>>>> 3050757 [SyncThread:0] TRACE
>>>>>>>> org.apache.zookeeper.server.quorum.Leader  -
>>>>>>>> outstanding proposals all
>>>>>>>> 3050757 [SyncThread:0] DEBUG
>>>>>>>> org.apache.zookeeper.server.quorum.Leader  -
>>>>>>>> Count for zxid: 0x50000027e is 1
>>>>>>>> 3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
>>>>>>>> org.apache.zookeeper.server.quorum.Leader  - Ack zxid: 0x50000027e
>>>>>>>> 3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
>>>>>>>> org.apache.zookeeper.server.quorum.Leader  - outstanding
proposal:
>>>>>>>> 0x50000027e
>>>>>>>> 3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
>>>>>>>> org.apache.zookeeper.server.quorum.Leader  - outstanding
>>>>>>>> proposals all
>>>>>>>> 3050757 [FollowerHandler-/172.16.0.28:48776] DEBUG
>>>>>>>> org.apache.zookeeper.server.quorum.Leader  - Count for zxid:
>>>>>>>> 0x50000027e is
>>>>>>>> 2
>>>>>>>> 3050757 [FollowerHandler-/172.16.0.28:48776] DEBUG
>>>>>>>> org.apache.zookeeper.server.quorum.CommitProcessor  - Committing
>>>>>>>> request::
>>>>>>>> sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
>>>>>>>> zxid:0x50000027e
>>>>>>>> txntype:-1 n/a
>>>>>>>> 3050757 [CommitProcessor:0] DEBUG
>>>>>>>> org.apache.zookeeper.server.FinalRequestProcessor  - Processing
>>>>>>>> request::
>>>>>>>> sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
>>>>>>>> zxid:0x50000027e
>>>>>>>> txntype:-1 n/a
>>>>>>>> 3050757 [CommitProcessor:0] TRACE
>>>>>>>> org.apache.zookeeper.server.FinalRequestProcessor  -
>>>>>>>> :Esessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
>>>>>>>> zxid:0x50000027e
>>>>>>>> txntype:-1 n/a
>>>>>>>> 3050757 [FollowerHandler-/172.16.0.28:41062] TRACE
>>>>>>>> org.apache.zookeeper.server.quorum.Leader  - Ack zxid: 0x50000027e
>>>>>>>> 3050757 [FollowerHandler-/172.16.0.28:41062] TRACE
>>>>>>>> org.apache.zookeeper.server.quorum.Leader  - outstanding
>>>>>>>> proposals all
>>>>>>>> 3050757 [FollowerHandler-/172.16.0.28:41062] DEBUG
>>>>>>>> org.apache.zookeeper.server.quorum.Leader  - outstanding
is 0
>>>>>>>> --------------------------
>>>>>>>>
>>>>>>>> I'm still a n00b at understanding ZK log messages, so maybe

>>>>>>>> there's
>>>>>>>> something obvious going on.  I looked in the JIRA and did
my 
>>>>>>>> best to
>>>>>>>> search
>>>>>>>> the mailing list archives, but couldn't find anything related
to
>>>>>>>> this.  Any
>>>>>>>> ideas?  Thanks very much,
>>>>>>>>
>>>>>>>> Jeremy
>>>>>>>>
>

Mime
View raw message