hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishal K <vishalm...@gmail.com>
Subject Re: How to handle "Node does not exist" error?
Date Thu, 12 Aug 2010 13:33:24 GMT
Hi Ted,

Can you explain why running ZK in embedded mode can cause znode
inconsistencies?
Thanks.

-Vishal

On Thu, Aug 12, 2010 at 12:01 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Try running the server in non-embedded mode.
>
> Also, you are assuming that you know everything about how to configure the
> quorumPeer.  That is going to change and your code will break at that time.
>  If you use a non-embedded cluster, this won't be a problem and you will be
> able to upgrade ZK version without having to restart your service.
>
> My own opinion is that running an embedded ZK is a serious architectural
> error.  Since I don't know your particular situation, it might be
> different,
> but there is an inherent contradiction involved in running a coordination
> layer as part of the thing being coordinated.  Whatever your software does,
> it isn't what ZK does.  As such, it is better to factor out the ZK
> functionality and make it completely stable.  That gives you a much simpler
> world and will make it easier for you to trouble shoot your system.  The
> simple fact that you can't take down your service without affecting the
> reliability of your ZK layer makes this a very bad idea.
>
> The problems you are having now are only a preview of what this
> architectural error leads to.  There will be more problems and many of them
> are likely to be more subtle and lead to service interruptions and lots of
> wasted time.
>
> On Wed, Aug 11, 2010 at 8:49 PM, Dr Hao He <he@softtouchit.com> wrote:
>
> > hi, Ted and Mahadev,
> >
> >
> > Here are some more details about my setup:
> >
> > I run zookeeper in the embedded mode with the following code:
> >
> >                                        quorumPeer = new QuorumPeer();
> >
> >  quorumPeer.setClientPort(getClientPort());
> >                                        quorumPeer.setTxnFactory(new
> > FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir())));
> >
> >  quorumPeer.setQuorumPeers(getServers());
> >
> >  quorumPeer.setElectionType(getElectionAlg());
> >                                        quorumPeer.setMyid(getServerId());
> >
> >  quorumPeer.setTickTime(getTickTime());
> >
> >  quorumPeer.setInitLimit(getInitLimit());
> >
> >  quorumPeer.setSyncLimit(getSyncLimit());
> >
> >  quorumPeer.setQuorumVerifier(getQuorumVerifier());
> >
> >  quorumPeer.setCnxnFactory(cnxnFactory);
> >                                        quorumPeer.start();
> >
> >
> > The configuration values are read from the following XML document for
> > server 1:
> >
> > <cluster tickTime="1000" initLimit="10" syncLimit="5" clientPort="2181"
> > serverId="1">
> >                  <member id="1" host="192.168.2.6:2888:3888"/>
> >                  <member id="2" host="192.168.2.3:2888:3888"/>
> >                  <member id="3" host="192.168.2.4:2888:3888"/>
> > </cluster>
> >
> >
> > The other servers have the same configurations except their ids being
> > changed to 2 and 3.
> >
> > The error occurred on server 3 when I batch loaded some messages to
> server
> > 1.  However, this error does not always happen.  I am not sure exactly
> what
> > trigged this error yet.
> >
> > I also performed the "stat" operation on one of the "No exit" node and
> got:
> >
> > stat
> > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000001583
> > Exception in thread "main" java.lang.NullPointerException
> >        at
> > org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129)
> >        at
> > org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715)
> >        at
> > org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579)
> >        at
> > org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351)
> >        at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309)
> >        at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268)
> > [xpe@t43 zookeeper-3.2.2]$ bin/zkCli.sh
> >
> >
> > Those message nodes are created as CreateMode.PERSISTENT_SEQUENTIAL and
> are
> > deleted by the last server who has read them.
> >
> > If I remove the troubled server's zookeeper log directory and restart the
> > server, then everything is ok.
> >
> > I will try to get the nc result next time I see this problem.
> >
> >
> > Dr Hao He
> >
> > XPE - the truly SOA platform
> >
> > he@softtouchit.com
> > http://softtouchit.com
> > http://itunes.com/apps/Scanmobile
> >
> > On 12/08/2010, at 12:32 AM, Mahadev Konar wrote:
> >
> > > HI Dr Hao,
> > >  Can you please post the configuration of all the 3 zookeeper servers?
> I
> > > suspect it might be misconfigured clusters and they might not belong to
> > the
> > > same ensemble.
> > >
> > > Just to be clear:
> > > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002807
> > >
> > > And other such nodes exist on one of the zookeeper servers and the same
> > node
> > > does not exist on other servers?
> > >
> > > Also, as ted pointed out, can you please post the output of echo ³stat²
> |
> > nc
> > > localhost 2181 (on all the 3 servers) to the list?
> > >
> > > Thanks
> > > mahadev
> > >
> > >
> > >
> > > On 8/11/10 12:10 AM, "Dr Hao He" <he@softtouchit.com> wrote:
> > >
> > >> hi, Ted,
> > >>
> > >> Thanks for the reply.  Here is what I did:
> > >>
> > >> [zk: localhost:2181(CONNECTED) 0] ls
> > >>
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > >> []
> > >> zk: localhost:2181(CONNECTED) 1] ls
> > >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
> > >> [msg0000002807, msg0000002700, msg0000002701, msg0000002804,
> > msg0000002704,
> > >> msg0000002706, msg0000002601, msg0000001849, msg0000001847,
> > msg0000002508,
> > >> msg0000002609, msg0000001841, msg0000002607, msg0000002606,
> > msg0000002604,
> > >> msg0000002809, msg0000002817, msg0000001633, msg0000002812,
> > msg0000002814,
> > >> msg0000002711, msg0000002815, msg0000002713, msg0000002716,
> > msg0000001772,
> > >> msg0000002811, msg0000001635, msg0000001774, msg0000002515,
> > msg0000002610,
> > >> msg0000001838, msg0000002517, msg0000002612, msg0000002519,
> > msg0000001973,
> > >> msg0000001835, msg0000001974, msg0000002619, msg0000001831,
> > msg0000002510,
> > >> msg0000002512, msg0000002615, msg0000002614, msg0000002617,
> > msg0000002104,
> > >> msg0000002106, msg0000001769, msg0000001768, msg0000002828,
> > msg0000002822,
> > >> msg0000001760, msg0000002820, msg0000001963, msg0000001961,
> > msg0000002110,
> > >> msg0000002118, msg0000002900, msg0000002836, msg0000001757,
> > msg0000002907,
> > >> msg0000001753, msg0000001752, msg0000001755, msg0000001952,
> > msg0000001958,
> > >> msg0000001852, msg0000001956, msg0000001854, msg0000002749,
> > msg0000001608,
> > >> msg0000001609, msg0000002747, msg0000002882, msg0000001743,
> > msg0000002888,
> > >> msg0000001605, msg0000002885, msg0000001487, msg0000001746,
> > msg0000002330,
> > >> msg0000001749, msg0000001488, msg0000001489, msg0000001881,
> > msg0000001491,
> > >> msg0000002890, msg0000001889, msg0000002758, msg0000002241,
> > msg0000002892,
> > >> msg0000002852, msg0000002759, msg0000002898, msg0000002850,
> > msg0000001733,
> > >> msg0000002751, msg0000001739, msg0000002753, msg0000002756,
> > msg0000002332,
> > >> msg0000001872, msg0000002233, msg0000001721, msg0000001627,
> > msg0000001720,
> > >> msg0000001625, msg0000001628, msg0000001629, msg0000001729,
> > msg0000002350,
> > >> msg0000001727, msg0000002352, msg0000001622, msg0000001726,
> > msg0000001623,
> > >> msg0000001723, msg0000001724, msg0000001621, msg0000002736,
> > msg0000002738,
> > >> msg0000002363, msg0000001717, msg0000002878, msg0000002362,
> > msg0000002361,
> > >> msg0000001611, msg0000001894, msg0000002357, msg0000002218,
> > msg0000002358,
> > >> msg0000002355, msg0000001895, msg0000002356, msg0000001898,
> > msg0000002354,
> > >> msg0000001996, msg0000001990, msg0000002093, msg0000002880,
> > msg0000002576,
> > >> msg0000002579, msg0000002267, msg0000002266, msg0000002366,
> > msg0000001901,
> > >> msg0000002365, msg0000001903, msg0000001799, msg0000001906,
> > msg0000002368,
> > >> msg0000001597, msg0000002679, msg0000002166, msg0000001595,
> > msg0000002481,
> > >> msg0000002482, msg0000002373, msg0000002374, msg0000002371,
> > msg0000001599,
> > >> msg0000002773, msg0000002274, msg0000002275, msg0000002270,
> > msg0000002583,
> > >> msg0000002271, msg0000002580, msg0000002067, msg0000002277,
> > msg0000002278,
> > >> msg0000002376, msg0000002180, msg0000002467, msg0000002378,
> > msg0000002182,
> > >> msg0000002377, msg0000002184, msg0000002379, msg0000002187,
> > msg0000002186,
> > >> msg0000002665, msg0000002666, msg0000002381, msg0000002382,
> > msg0000002661,
> > >> msg0000002662, msg0000002663, msg0000002385, msg0000002284,
> > msg0000002766,
> > >> msg0000002282, msg0000002190, msg0000002599, msg0000002054,
> > msg0000002596,
> > >> msg0000002453, msg0000002459, msg0000002457, msg0000002456,
> > msg0000002191,
> > >> msg0000002652, msg0000002395, msg0000002650, msg0000002656,
> > msg0000002655,
> > >> msg0000002189, msg0000002047, msg0000002658, msg0000002659,
> > msg0000002796,
> > >> msg0000002250, msg0000002255, msg0000002589, msg0000002257,
> > msg0000002061,
> > >> msg0000002064, msg0000002585, msg0000002258, msg0000002587,
> > msg0000002444,
> > >> msg0000002446, msg0000002447, msg0000002450, msg0000002646,
> > msg0000001501,
> > >> msg0000002591, msg0000002592, msg0000001503, msg0000001506,
> > msg0000002260,
> > >> msg0000002594, msg0000002262, msg0000002263, msg0000002264,
> > msg0000002590,
> > >> msg0000002132, msg0000002130, msg0000002530, msg0000002931,
> > msg0000001559,
> > >> msg0000001808, msg0000002024, msg0000001553, msg0000002939,
> > msg0000002937,
> > >> msg0000001556, msg0000002935, msg0000002933, msg0000002140,
> > msg0000001937,
> > >> msg0000002143, msg0000002520, msg0000002522, msg0000002429,
> > msg0000002524,
> > >> msg0000002920, msg0000002035, msg0000001561, msg0000002134,
> > msg0000002138,
> > >> msg0000002925, msg0000002151, msg0000002287, msg0000002555,
> > msg0000002010,
> > >> msg0000002002, msg0000002290, msg0000001537, msg0000002005,
> > msg0000002147,
> > >> msg0000002145, msg0000002698, msg0000001592, msg0000001810,
> > msg0000002690,
> > >> msg0000002691, msg0000001911, msg0000001910, msg0000002693,
> > msg0000001812,
> > >> msg0000001817, msg0000001547, msg0000002012, msg0000002015,
> > msg0000002941,
> > >> msg0000001688, msg0000002018, msg0000002684, msg0000002944,
> > msg0000001540,
> > >> msg0000002686, msg0000001541, msg0000002946, msg0000002688,
> > msg0000001584,
> > >> msg0000002948]
> > >>
> > >> [zk: localhost:2181(CONNECTED) 7] delete
> > >>
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > >> Node does not exist:
> > >>
> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948
> > >>
> > >> When I performed the same operations on another node, none of those
> > nodes
> > >> existed.
> > >>
> > >>
> > >> Dr Hao He
> > >>
> > >> XPE - the truly SOA platform
> > >>
> > >> he@softtouchit.com
> > >> http://softtouchit.com
> > >> http://itunes.com/apps/Scanmobile
> > >>
> > >> On 11/08/2010, at 4:38 PM, Ted Dunning wrote:
> > >>
> > >>> Can you provide some more information?  The output of some of the
> four
> > >>> letter commands and a transcript of what you are doing would be very
> > >>> helpful.
> > >>>
> > >>> Also, there is no way for znodes to exist on one node of a properly
> > >>> operating ZK cluster and not on either of the other two.  Something
> has
> > to
> > >>> be wrong and I would vote for operator error (not to cast aspersions,
> > it is
> > >>> just that humans like you and *me* make more errors than ZK does).
> > >>>
> > >>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <he@softtouchit.com>
> > wrote:
> > >>>
> > >>>> hi, All,
> > >>>>
> > >>>> I have a 3-host cluster running ZooKeeper 3.2.2.  On one of the
> hosts,
> > >>>> there are a number of nodes that I can "get" and "ls" using zkCli.sh
> .
> > >>>> However, when I tried to "delete" any of them, I got "Node does
not
> > exist"
> > >>>> error.    Those nodes do not exist on the other two hosts.
> > >>>>
> > >>>> Any idea how we should handle this type of errors and what might
> have
> > >>>> caused this problem?
> > >>>>
> > >>>> Dr Hao He
> > >>>>
> > >>>> XPE - the truly SOA platform
> > >>>>
> > >>>> he@softtouchit.com
> > >>>> http://softtouchit.com
> > >>>> http://itunes.com/apps/Scanmobile
> > >>>>
> > >>>>
> > >>
> > >>
> > >
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message