zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lists lists <li...@beobal.com>
Subject Re: building an distribuited mail-store UID generator using zookeeper
Date Thu, 26 Jan 2012 12:51:57 GMT
cc'ing the list with my reply to Ioan's offline mail...

According to [1], the typical size of a znode is 40-80 bytes, h1 adds
very little to this as the value for a given key is simply stored as
an 8-byte array in the znode's data. Doing some very rough maths, if
your keys (the mailbox ids), are 128-bit UUIDs, lets allow 100 bytes
for Zookeeper's internal znode datastructure, plus 8 bytes for the
incrementing value, plus another 16 bytes for wiggle room. By my
calculations, that means that each Zookeeper server requires around
115 Mb of memory per 1,000,000 keys. This is a pretty rough
approximation as I'm not familiar with what overhead Zookeeper adds,
other than the znode itself. But, as h1 doesn't use any of Zookeeper's
additional features, like watches, I'd say it's pretty lightweight.

While we've not seriously tested h1 to the scale that you're talking
about, it has been running in our production environment for around 2
years now serving millions of requests per day, albeit for only a few
thousand distinct keys. In this setup, it runs on very modest
(virtualised) hardware, with minimal resource consumption. My
intuition says that scaling a single h1 cluster up to 10s of millions
of keys shouldn't be a problem, but if/when we start to hit its
boundaries, our plan is to mitigate that with multiple h1 clusters,
each responsible for a particular set of keys with a simple hashing
algorithm to determine which cluster to route a given request to.

When we first built h1, we had some rather specific requirements which
lead us to the design we came up with. These were that each time an id
is requested, it must be incremented by *exactly* 1, the design of our
larger system means that we cannot tolerate any gaps in the sequence
of ids generated for a given key. Secondly, and related to that, we
needed to ensure that whatever backend was used for storing the
sequences, it had to be highly durable. That is, before a sequence is
returned to a client, it must have been written to disk by the h1
server. Thirdly (and slightly less crucial), we wanted HA/fault
tolerance and the ability to survive the loss of a single server,
perhaps at the price of degraded performance. These three requirements
are the primary reason we went with Zookeeper over other solutions
like Redis.

Of course, meetng these requirements involves some tradeoffs - h1 does
not perform brilliantly when there are many concurrent requests to
increment the sequence for a single key. The optimistic concurrency
approach we employ in the case of such contention is pretty brutal and
can add quite a few ms to those requests. Fortunately for us, thats
not typical in our system, and requests are generally well distributed
across the key space.

Another requirement we had was to access h1 from non-java processes,
hence the HTTP interface. Obviously, the verbosity of HTTP adds quite
an overhead and lowers throughput but the ubiquity of the protocol and
the ease at which we have been able to integrate it into our other
components has more than made up for this. Of course, there's nothing
stopping someone from adding alternative interfaces to h1 should they
require them (we're also heavy users of Apache Thrift[2] and have
thought about adding a Thrift interface).

I hope this is useful information to you, if I can help at all feel
free to ping me anytime. Forks/pull requests also always welcome :)

Cheers,
Sam


[1]
http://zookeeper-user.578899.n2.nabble.com/Size-of-a-znode-in-memory-td5462569.html
[2] http://thrift.apache.org

On 25 January 2012 20:12, Norman Maurer <norman@apache.org> wrote:

> Thanks for the link... After thinking more about it I guess using the
> Version of the znode would be the easist solution and would fullfill
> all the needs. Having Integer.MAX_VALUE should also be good enough as
> this is per mailbox.
>
> Bye,
> Norman
>
>
> 2012/1/25 Ioan Eugen Stan <stan.ieugen@gmail.com>:
> > 2012/1/25 Norman Maurer <norman@apache.org>:
> >> Hi Ioan,
> >>
> >> the link to [2] just point to the archive browser.
> >
> > Sorry, here is the update:
> >
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201201.mbox/%3CCAJwFCa2uYKHSX0r_KsuPn19A8W3AntTUOPDcWgN%2B6zw8x3P9hA%40mail.gmail.com%3E
> >
> > It's the thread about Use cases for ZooKeeper
> >
> > Thanks Norman,
> >
> > --
> > Ioan Eugen Stan
> > http://ieugen.blogspot.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message