zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jung Young Seok <jung.youngs...@gmail.com>
Subject Re: Regarding large number of watch count
Date Wed, 07 May 2014 07:37:46 GMT
Thank you for your suggestion.

I've checked watch node with wchp.
As you can see zk_watch_count  is 3587 (pasted below)
However we delete every node after usage.

# zkCli.sh
[zk: localhost:2181(CONNECTED) 0] ls /aws/user
[]


# echo wchp | nc localhost 2181 | wc -l
7175  (zk_watch_count  is 3587)


# echo wchp | nc localhost 2181
/aws/user/9cb3ecea-fe4a-4b47/_c_f6b8c5ca-f4e3-4755-9bab-df27c4ff239f-lock-0000000000
        0x145983186f10006
/aws/user/6534f8b7-8707-4641/_c_5dbf1dea-7392-436c-97fe-ba97d45068fe-lock-0000000001
        0x145983186f10007
/aws/user/00e2422a-8afb-4ea9/_c_4bcc9937-ddac-4b67-ad79-c9d36f8da0b0-lock-0000000000
        0x145983186f10007
/aws/user/08ebc05d-acb1-4243/_c_362a838a-deda-48cb-b6ed-d37417191dc9-lock-0000000003
        0x145983186f10007
/aws/user/05da9afd-4e01-4843/_c_e18fd405-e509-4b2c-9419-6a4b5d85477a-lock-0000000000
        0x145983186f10007
/aws/user/4da11842-a033-4016/_c_6edbba60-ad3c-4dad-acfa-2c2fb49a9c89-lock-0000000000
        0x145983186f10007
/aws/user/a8b4055b-7248-45ef/_c_df409fd1-9b0e-4c72-a4a4-5ee0f5a09dc8-lock-0000000000
        0x145983186f10006
/aws/user/4bf4c860-e3ad-442c/_c_5cdca3aa-490f-475a-8430-3f1e56dd6bae-lock-0000000000
        0x145983186f10006
/aws/user/4952533f-8d34-4041/_c_72f4bd75-ad6f-454c-a734-9bc85948f3e4-lock-0000000000
        0x145983186f10007
/aws/user/a9d04409-9c69-4fb6/_c_2d4f1df0-0a0f-4608-ab28-94460f8d0773-lock-0000000000
        0x145983186f10007

.... omit ....


We use zookeeper with curator framework. Here's how we use.

...
//acquire part

if( vo == null ) {
vo = new LockVO();
vo.fullPath = fullPath;
vo.lock = new InterProcessMutex(client, vo.fullPath);
lockFullPathMap.put(fullPath, vo);
}

try {
if( !vo.lock.acquire(lockTimeout, TimeUnit.MILLISECONDS) ) {
throw new IllegalStateException("Fail to acquire lock");
}
vo.count++;
} catch (final RuntimeException e) {
throw e;
}
...


...
//release part
vo.lock.release();
if( vo.count == 0 ) {
try {
client.delete().forPath(fullPath);
} catch( final NoNodeException | NotEmptyException e) {
/* ignore exception */
}
}
...

I don't understand why the watch count is getting bigger even though we
delete every single node after lock usage.
Watch count is increased slowly not every single lock/release request. (it
took 7 days to reach 3587)

When we restart all WAS connecting to the zookeeper, the watch count get
clear to zero.

Do you guys have any idea why it's increasing?
Is there any way to clear the watch count in running environment?
I'm worried that it stops suddenly due to memory leak or any segmentation
fault.

I'd appreciate on any idea.
Thanks,

Sincerely,
Youngseok Jung


2014-04-30 2:00 GMT+09:00 Raúl Gutiérrez Segalés <rgs@itevenworks.net>:

> Hi,
>
>
> On 29 April 2014 01:30, Jung Young Seok <jung.youngseok@gmail.com> wrote:
>
> > Dear Zookeeper-user,
> >
> > We have 3 zookeeper node clustered.
> > The zookeeper is used as application lock coordinator.
> > Client creates node(key) when it needs lock then release and delete the
> > node(key) when the usage is done.
> >
> > Strangely, zk_watch_count has been increased.
> > In case of leader, zk_watch_count reached 6804.
> >
> > ==================================================================
> > Detailed information is below. (mntr)
> >
> > 1. Zoo-1 (Follower)
> > Status Information: OK Zookeeper State : follower zk_avg_latency 2
> > zk_max_latency 215 zk_min_latency 0 zk_packets_received 9001127
> > zk_packets_sent 9027569 zk_num_alive_connections 3
> zk_outstanding_requests
> > 0 zk_server_state follower zk_znode_count 12 zk_watch_count 1786
> > zk_ephemerals_count 2 zk_approximate_data_size 525
> > zk_open_file_descriptor_count 29 zk_max_file_descriptor_count 4096
> > Performance Data: zk_version 3.4.6-1569965, built on 02/20/2014 09:09 GMT
> > 2. Zoo-2 (Follower)
> > Status Information: OK Zookeeper State : follower zk_avg_latency 0
> > zk_max_latency 6 zk_min_latency 0 zk_packets_received 3539
> zk_packets_sent
> > 3538 zk_num_alive_connections 2 zk_outstanding_requests 0 zk_server_state
> > follower zk_znode_count 12 zk_watch_count 0 zk_ephemerals_count 2
> > zk_approximate_data_size 525 zk_open_file_descriptor_count 28
> > zk_max_file_descriptor_count 4096 Performance Data: zk_version
> > 3.4.6-1569965, built on 02/20/2014 09:09 GMT
> > 3. Zoo-3 (Leader) Status Information: OK Zookeeper State : leader
> > zk_avg_latency 1 zk_max_latency 214 zk_min_latency 0 zk_packets_received
> > 21575604 zk_packets_sent 21638420 zk_num_alive_connections 4
> > zk_outstanding_requests 0 zk_server_state leader zk_znode_count 18
> > zk_watch_count 6804 zk_ephemerals_count 5 zk_approximate_data_size 954
> > zk_open_file_descriptor_count 32 zk_max_file_descriptor_count 4096
> > zk_followers 2 zk_synced_followers 2 zk_pending_syncs 0 Performance Data:
> > zk_version 3.4.6-1569965, built on 02/20/2014 09:09 GMT
> >
> > ==================================================================
> >
> > My questions are,
> > 1. Is it normal case that zk_watch_count reached 6804?
> >
> > 2. Why has zk_watch_count been increased?
> > - We use tomcat + Apache Curator + Zookeeper 3.4.6
> >
> > 3. Would it make trouble if zk_watch_count reach too large number?
> >
> > 4. Is there any way that I can reduce the zk_watch_count?
> >
> >
> You can introspect the watches via wchs (summary), wchc (watches by
> session) and wchp (watches by path). That'll give you an idea of what's
> going on. For example, on one of my servers:
>
> $ echo wchp | nc localhost 2181
> /messaging/00/0019/L2383
>         0x45aa3508a3ab77
>         0x45aa35089f8dce
>         0x45aa3508a2837d
> /search/member_0000283539
>         0x145aa2cc2b345d7
> /messaging/00/0019/L2384
>         0x45aa35089f8de4
> ...
>
>
> -rgs
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message