curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cameron McKenzie <mckenzie....@gmail.com>
Subject Re: How to use PathChildrenCache properly for keeping a watch on three znodes on Zookeeper?
Date Wed, 23 Jul 2014 21:49:42 GMT
When you call cache.getListenable().addListener(listener), if you don't
provide an executor service, then it will use the default Curator one,
which will only have a single thread. This means that your listener will
only be called by a single thread, so if you do anything in that listener
that blocks (i.e restarting the server) then you're not going to get any
other events until after your blocking event has finished.

You could either provide an executor that allows more than a single thread
(and thus your childEvent method could get called concurrently for
different events), or you could use your own executor to do the restarts,
which would allow the curator event thread to keep processing stuff.


On Thu, Jul 24, 2014 at 6:33 AM, Check Peck <comptechgeeky@gmail.com> wrote:

> I am using Curator library for Zookeeper. I am using zookeeper to monitor
> whether my app servers are up or not. If they are not up or shut down, then
> bring them up. I need to keep a watch on three of my znodes on the
> zookeeper. I am keeping a watch on ("/test/proc/phx/server",
> ("/test/proc/slc/server") and ("/test/proc/lvs/server"))I have a znode
> structure like below.
>
>     /test/proc
>             /phx
>                 /server
>                     /h1
>                     /h2
>                     /h3
>                     /h4
>                     /h5
>             /slc
>                 /server
>                     /h1
>                     /h2
>                     /h3
>                     /h4
>                     /h5
>             /lvs
>                 /server
>                     /h1
>                     /h2
>                     /h3
>                     /h4
>                     /h5
>
> As you can see above, for "/test/proc/phx/server", we have 5 hosts
> starting with "h", similary for slc and lvs as well. And all those hosts
> starting with "h" are ephimeral znodes. Now as soon as any server dies,
> let's say for PHX, h4 machine went down, then the "h4" ephemeral znodes
> gets deleted from the "/test/proc/phx/server" and then I will try to
> re-start h4 machine on PHX datacenter. Similarly with SLC and LVS.
>
> Below is my code by which I am keeping a watch and re-starting the servers
> if any machine went down in any datacenters. With the below code what I am
> seeing is, suppose if three machine went down in same datacenter, then it
> restart those three one by one. Meaning let's say h1, h3, h5 went down in
> PHX datacenter, then first it will restart h1 and as soon as h1 is done,
> then it will restart h3 and then h5. So it is always waiting for one to get
> finished and then restart another host. I am not sure why? Those three
> should be restarted instantly right since it's a background thread ?
>
> And also sometimes what I am seeing if all the hosts went down instantly
> then it doesn't restart anything? May be thread is getting stuck? Does my
> below code looks right with the way I am keeping a watch on three
> Datacenters PHX ("/test/proc/phx/server"), SLC("/test/proc/slc/server") and
> LVS("/test/proc/lvs/server")
>
>     List<String> datacenters = Arrays.asList("PHX", "SLC", "LVS");
>     for (String dc : datacenters) {
>         // in this example we will cache data. Notice that this is
> optional.
>         PathChildrenCache cache = new
> PathChildrenCache(zookClient.getClient(), "/test/proc" + "/" + dc + "/" +
> "server", true);
>         cache.start();
>
>         addListener(cache);
>     }
>
>     private static void addListener(PathChildrenCache cache) {
>
>         PathChildrenCacheListener listener = new
> PathChildrenCacheListener() {
>             public void childEvent(CuratorFramework client,
> PathChildrenCacheEvent event) throws Exception {
>                 switch (event.getType()) {
>                 case CHILD_ADDED: {
>                     if (zookClient.isLeader()) {
>                         String path =
> ZKPaths.getPathAndNode(event.getData().getPath()).getPath();
>                         String node =
> ZKPaths.getNodeFromPath(event.getData().getPath());
>                         String datacenter = path.split("/")[3];
>
>                         System.out.println("Node added: Path= ", path, ",
> Actual Node= ", node, ", Datacenter= ", datacenter);
>
>                         break;
>                     }
>                 }
>
>                 case CHILD_UPDATED: {
>                     if (zookClient.isLeader()) {
>                         String path =
> ZKPaths.getPathAndNode(event.getData().getPath()).getPath();
>                         String node =
> ZKPaths.getNodeFromPath(event.getData().getPath());
>                         String datacenter = path.split("/")[3];
>
>                         System.out.println("Node updated: Path= ", path,
> ", Actual Node= ", node, ", Datacenter= ", datacenter);
>
>                         break;
>                     }
>                 }
>
>                 case CHILD_REMOVED: {
>                     if (zookClient.isLeader()) {
>                         String path =
> ZKPaths.getPathAndNode(event.getData().getPath()).getPath();
>                         String node =
> ZKPaths.getNodeFromPath(event.getData().getPath());
>                         String datacenter = path.split("/")[3];
>
>                         System.out.println("Node removed: Path= ", path,
> ", Actual Node= ", node, ", Datacenter= ", datacenter);
>
>                         // restart machine which goes down
>                         // I am assuming as soon as any machine went down,
> call will come here instantly without waiting for anything?
>
>                         break;
>                     }
>                 }
>                 default:
>                     break;
>
>                 }
>             }
>         };
>         cache.getListenable().addListener(listener);
>     }
>

Mime
View raw message