zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pramod Biligiri <pramodbilig...@gmail.com>
Subject Re: Partitioned Zookeeper
Date Mon, 19 May 2014 03:25:00 GMT
Hi,
[Let me know if you want this thread moved to the Dev list (or even to
JIRA). I was only seeing automated mails there so I thought I'll go ahead
and post here]

I have been looking at the codebase the last couple of days (see my notes
regarding the same here:
https://docs.google.com/document/d/1TcohOWsUBXehS-E50bYY77p8SnGsF3IrBtu_LleP80U/edit
).

We are planning to do a proof-of-concept for the partitioning concept as
part of a class project, and measure any possible performance gains. Since
we're new to Zookeeper and short on time, it may not be the *right* way to
do it, but I hope it can give some pointers for the future.

Design approaches to implement a partitioned Zookeeper

For starters, let's assume we only parallelize accesses to paths starting
with a different top-level prefix, i.e. /app1/child1, /app2/child1, /config
etc

Possible approach:

Have a different tree object for each top-level node (/app1, /app2 etc).
This loosely corresponds to a container in the Wiki page [1], and
corresponds to the DataTree class in the codebase

- As soon as a request comes in, associate it with one of the trees. Since
each request necessarily has a path associated with it, this is possible.

- Then, all the queues that are used to process requests should operate
parallelly on these different trees. This can be done by having multiple
queues - one for each container.

Potential issues:

- Whether ZK code is designed to work with multiple trees instead of just
one

- Whether the queuing process (which uses RequestProcessors) is designed to
handle multiple queues

- Make sure performance actually improves, and does not degrade!

Discussion:

- Where is the performance benefit actually going to come from?

Intuitively, we might think that parallel trees might give a benefit, but
since each node logs all change records to disk before applying them, isn't
disk the throughput bottleneck? If I remember right, the ZK paper says that
with proper configs, they are able to make ZK I/O bound.

So along with having separate trees and associated processing, should we
also have separate logging to disk for each tree? Will this actually help
in improving write speeds to disk?

References:

1. The wiki page:
http://wiki.apache.org/hadoop/ZooKeeper/PartitionedZookeeper

2. The JIRA discussion: https://issues.apache.org/jira/browse/ZOOKEEPER-646

3. In this blog post, see the section called Scalability and Hashing
Zookeeper clusters:
http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cage

Thanks,
Pramod
-- 
http://twitter.com/pramodbiligiri


On Fri, May 16, 2014 at 10:56 PM, Pramod Biligiri
<pramodbiligiri@gmail.com>wrote:

> Thanks Michi,
> That was a very useful link! :)
>
> Pramod
>
>
> On Fri, May 16, 2014 at 3:37 PM, Michi Mutsuzaki <michi@cs.stanford.edu>wrote:
>
>> Hi Pramod,
>>
>> No it has not been implemented, and I'm not aware of any recipes.
>> There is an open JIRA for this feature.
>>
>> https://issues.apache.org/jira/browse/ZOOKEEPER-646
>>
>> On Thu, May 15, 2014 at 12:59 PM, Pramod Biligiri
>> <pramodbiligiri@gmail.com> wrote:
>> > Hi,
>> > The Zookeeper wiki talks about Partitioned Zookeeper:
>> >
>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/PartitionedZooKeeper
>> >
>> > I wanted to know if that has already been implemented or not. If not,
>> are
>> > there some recipes which can make Zookeeper behave in that way?
>> >
>> > Thanks.
>> >
>> > Pramod
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message