curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Check Peck <comptechge...@gmail.com>
Subject How to use PathChildrenCache properly for keeping a watch on three znodes on Zookeeper?
Date Wed, 23 Jul 2014 20:33:33 GMT
I am using Curator library for Zookeeper. I am using zookeeper to monitor
whether my app servers are up or not. If they are not up or shut down, then
bring them up. I need to keep a watch on three of my znodes on the
zookeeper. I am keeping a watch on ("/test/proc/phx/server",
("/test/proc/slc/server") and ("/test/proc/lvs/server"))I have a znode
structure like below.

    /test/proc
            /phx
                /server
                    /h1
                    /h2
                    /h3
                    /h4
                    /h5
            /slc
                /server
                    /h1
                    /h2
                    /h3
                    /h4
                    /h5
            /lvs
                /server
                    /h1
                    /h2
                    /h3
                    /h4
                    /h5

As you can see above, for "/test/proc/phx/server", we have 5 hosts starting
with "h", similary for slc and lvs as well. And all those hosts starting
with "h" are ephimeral znodes. Now as soon as any server dies, let's say
for PHX, h4 machine went down, then the "h4" ephemeral znodes gets deleted
from the "/test/proc/phx/server" and then I will try to re-start h4 machine
on PHX datacenter. Similarly with SLC and LVS.

Below is my code by which I am keeping a watch and re-starting the servers
if any machine went down in any datacenters. With the below code what I am
seeing is, suppose if three machine went down in same datacenter, then it
restart those three one by one. Meaning let's say h1, h3, h5 went down in
PHX datacenter, then first it will restart h1 and as soon as h1 is done,
then it will restart h3 and then h5. So it is always waiting for one to get
finished and then restart another host. I am not sure why? Those three
should be restarted instantly right since it's a background thread ?

And also sometimes what I am seeing if all the hosts went down instantly
then it doesn't restart anything? May be thread is getting stuck? Does my
below code looks right with the way I am keeping a watch on three
Datacenters PHX ("/test/proc/phx/server"), SLC("/test/proc/slc/server") and
LVS("/test/proc/lvs/server")

    List<String> datacenters = Arrays.asList("PHX", "SLC", "LVS");
    for (String dc : datacenters) {
        // in this example we will cache data. Notice that this is optional.
        PathChildrenCache cache = new
PathChildrenCache(zookClient.getClient(), "/test/proc" + "/" + dc + "/" +
"server", true);
        cache.start();

        addListener(cache);
    }

    private static void addListener(PathChildrenCache cache) {

        PathChildrenCacheListener listener = new
PathChildrenCacheListener() {
            public void childEvent(CuratorFramework client,
PathChildrenCacheEvent event) throws Exception {
                switch (event.getType()) {
                case CHILD_ADDED: {
                    if (zookClient.isLeader()) {
                        String path =
ZKPaths.getPathAndNode(event.getData().getPath()).getPath();
                        String node =
ZKPaths.getNodeFromPath(event.getData().getPath());
                        String datacenter = path.split("/")[3];

                        System.out.println("Node added: Path= ", path, ",
Actual Node= ", node, ", Datacenter= ", datacenter);

                        break;
                    }
                }

                case CHILD_UPDATED: {
                    if (zookClient.isLeader()) {
                        String path =
ZKPaths.getPathAndNode(event.getData().getPath()).getPath();
                        String node =
ZKPaths.getNodeFromPath(event.getData().getPath());
                        String datacenter = path.split("/")[3];

                        System.out.println("Node updated: Path= ", path, ",
Actual Node= ", node, ", Datacenter= ", datacenter);

                        break;
                    }
                }

                case CHILD_REMOVED: {
                    if (zookClient.isLeader()) {
                        String path =
ZKPaths.getPathAndNode(event.getData().getPath()).getPath();
                        String node =
ZKPaths.getNodeFromPath(event.getData().getPath());
                        String datacenter = path.split("/")[3];

                        System.out.println("Node removed: Path= ", path, ",
Actual Node= ", node, ", Datacenter= ", datacenter);

                        // restart machine which goes down
                        // I am assuming as soon as any machine went down,
call will come here instantly without waiting for anything?

                        break;
                    }
                }
                default:
                    break;

                }
            }
        };
        cache.getListenable().addListener(listener);
    }

Mime
View raw message