curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jor...@jordanzimmerman.com>
Subject Re: How to use PathChildrenCache properly for keeping a watch on three znodes on Zookeeper?
Date Wed, 23 Jul 2014 22:12:28 GMT
FYI 

Curator has the ThreadUtils class that makes it easier to make thread pools that are daemons.
Also, Tech Note 1 describes the single thread issue: https://cwiki.apache.org/confluence/display/CURATOR/TN1

-JZ


From: Cameron McKenzie <mckenzie.cam@gmail.com>
Reply: user@curator.apache.org <user@curator.apache.org>>
Date: July 23, 2014 at 5:11:09 PM
To: user@curator.apache.org <user@curator.apache.org>>
Subject:  Re: How to use PathChildrenCache properly for keeping a watch on three znodes on
Zookeeper?  

That should be fine, remember to shutdown executors when your system is shutting down though.

Keep in mind though that the delivery of events to your childEvent() method will no longer
be serialised. (i.e. Where you would previously get EventA, EventB, EventC one after another,
you will now get EventA, B and C called at the same time. This may be fine for you, but something
to keep in mind.

If this is an issue, then instead of passing the executor to the addListener call, you can
call your executor directly from the childEvent() method whenever you need to do something
long running.


On Thu, Jul 24, 2014 at 8:05 AM, Check Peck <comptechgeeky@gmail.com> wrote:
Thanks a lot Cameron. That makes sense a lot. So If I need to do it properly I should have
my executors declare like this at the top of my class -

    private static ExecutorService service = Executors.newFixedThreadPool(15);

and then use the above "ExecutorService" like below -

    cache.getListenable().addListener(listener, service);

Am I got everything right with the above code?



On Wed, Jul 23, 2014 at 2:49 PM, Cameron McKenzie <mckenzie.cam@gmail.com> wrote:
When you call cache.getListenable().addListener(listener), if you don't provide an executor
service, then it will use the default Curator one, which will only have a single thread. This
means that your listener will only be called by a single thread, so if you do anything in
that listener that blocks (i.e restarting the server) then you're not going to get any other
events until after your blocking event has finished.

You could either provide an executor that allows more than a single thread (and thus your childEvent
method could get called concurrently for different events), or you could use your own executor
to do the restarts, which would allow the curator event thread to keep processing stuff.


On Thu, Jul 24, 2014 at 6:33 AM, Check Peck <comptechgeeky@gmail.com> wrote:
I am using Curator library for Zookeeper. I am using zookeeper to monitor whether my app servers
are up or not. If they are not up or shut down, then bring them up. I need to keep a watch
on three of my znodes on the zookeeper. I am keeping a watch on ("/test/proc/phx/server",
("/test/proc/slc/server") and ("/test/proc/lvs/server"))I have a znode structure like below.

    /test/proc
            /phx
                /server
                    /h1
                    /h2
                    /h3
                    /h4
                    /h5
            /slc
                /server
                    /h1
                    /h2
                    /h3
                    /h4
                    /h5
            /lvs
                /server
                    /h1
                    /h2
                    /h3
                    /h4
                    /h5

As you can see above, for "/test/proc/phx/server", we have 5 hosts starting with "h", similary
for slc and lvs as well. And all those hosts starting with "h" are ephimeral znodes. Now as
soon as any server dies, let's say for PHX, h4 machine went down, then the "h4" ephemeral
znodes gets deleted from the "/test/proc/phx/server" and then I will try to re-start h4 machine
on PHX datacenter. Similarly with SLC and LVS.

Below is my code by which I am keeping a watch and re-starting the servers if any machine
went down in any datacenters. With the below code what I am seeing is, suppose if three machine
went down in same datacenter, then it restart those three one by one. Meaning let's say h1,
h3, h5 went down in PHX datacenter, then first it will restart h1 and as soon as h1 is done,
then it will restart h3 and then h5. So it is always waiting for one to get finished and then
restart another host. I am not sure why? Those three should be restarted instantly right since
it's a background thread ?

And also sometimes what I am seeing if all the hosts went down instantly then it doesn't restart
anything? May be thread is getting stuck? Does my below code looks right with the way I am
keeping a watch on three Datacenters PHX ("/test/proc/phx/server"), SLC("/test/proc/slc/server")
and LVS("/test/proc/lvs/server")
                   
    List<String> datacenters = Arrays.asList("PHX", "SLC", "LVS");
    for (String dc : datacenters) {
        // in this example we will cache data. Notice that this is optional.
        PathChildrenCache cache = new PathChildrenCache(zookClient.getClient(), "/test/proc"
+ "/" + dc + "/" + "server", true);
        cache.start();

        addListener(cache);
    }
   
    private static void addListener(PathChildrenCache cache) {

        PathChildrenCacheListener listener = new PathChildrenCacheListener() {
            public void childEvent(CuratorFramework client, PathChildrenCacheEvent
event) throws Exception {
                switch (event.getType()) {
                case CHILD_ADDED: {
                    if (zookClient.isLeader()) {
                        String path = ZKPaths.getPathAndNode(event.getData().getPath()).getPath();
                        String node = ZKPaths.getNodeFromPath(event.getData().getPath());
                        String datacenter = path.split("/")[3];

                        System.out.println("Node added: Path= ", path,
", Actual Node= ", node, ", Datacenter= ", datacenter);

                        break;
                    }
                }

                case CHILD_UPDATED: {
                    if (zookClient.isLeader()) {
                        String path = ZKPaths.getPathAndNode(event.getData().getPath()).getPath();
                        String node = ZKPaths.getNodeFromPath(event.getData().getPath());
                        String datacenter = path.split("/")[3];

                        System.out.println("Node updated: Path= ",
path, ", Actual Node= ", node, ", Datacenter= ", datacenter);

                        break;
                    }
                }

                case CHILD_REMOVED: {
                    if (zookClient.isLeader()) {
                        String path = ZKPaths.getPathAndNode(event.getData().getPath()).getPath();
                        String node = ZKPaths.getNodeFromPath(event.getData().getPath());
                        String datacenter = path.split("/")[3];

                        System.out.println("Node removed: Path= ",
path, ", Actual Node= ", node, ", Datacenter= ", datacenter);

                        // restart machine which goes down
                        // I am assuming as soon as any machine went down,
call will come here instantly without waiting for anything?

                        break;
                    }
                }
                default:
                    break;

                }
            }
        };
        cache.getListenable().addListener(listener);
    }   




Mime
View raw message