zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Shared block storage via ZooKepper
Date Wed, 13 Jul 2011 17:31:57 GMT
See BookKeeper.

The analogy is this:

ZK => Chubby
BookKeeper => distributed log
Application => Application.

On Wed, Jul 13, 2011 at 10:17 AM, Yang <teddyyyy123@gmail.com> wrote:

> actually I was just thinking about this and tried to ask exactly the same
> question.
> now zk is used to store small pieces of data such as shared config, and
> used for locking/coordination, but since it has a replicated data store, it
> would be nice to use to store large volumes of data directly.
> in fact from the "Paxos made live" paper:
> http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en/us/papers/paxos_made_live.pdf
>  page 3
> "We devoted effort to designing clean interfaces separating the Paxos
> framework, the database, and
> Chubby. We did this partly for clarity while developing this system, but
> also with the intention of reusing the
> replicated log layer in other applications. We anticipate future systems at
> Google that seek fault-tolerance
> through replication. We believe that a fault-tolerant log is a powerful
> primitive on which to build such
> systems.
> "
> essentially in the google paxos implementation, application code can simply
> grab the latest committed log record, and use it for whatever it wants for
> the application. if Zookeeper abstracts out the messaging protocol, and
> provides the committed transaction "stream" as the interface to
> applications, potentially we could use it for many applications, including
> data storage. note that this is completely outside of the current ZK data
> model (znode and etc ), all we use from ZK is the   underlying committed
> transactions stream, probably this part of ZK can be provided as a library.
> yang
> On Wed, Jul 13, 2011 at 5:01 AM, Flavio Junqueira <fpj@yahoo-inc.com>wrote:
>> Hi Simon, It is not entirely clear to me what you need zookeeper for in
>> this case. Are blocks replicated and you need to guarantee that the updates
>> are consistent across replicas?
>> On your observations, I'm quite sure people will have an opinion, so here
>> are my thoughts, which might not be representative of the whole community :
>> 1- You're right, we do not recommended to use ZooKeeper directly as the
>> data store. ZooKeeper servers keep their state in memory.
>> 2- Cassandra already provides replication. Are you trying to strengthen
>> the guarantees of Cassandra? I don't get it...
>> 3- Sound right that you could use BK as a journal, but it is not clear
>> which element is writing to the journal. Are you assuming a metadata manager
>> such as the namenode of HDFS?
>> 4- I'm not sure what this option means. Are you proposing ZooKeeper to
>> manage the metadata of the file system? If so, I don't find it entirely
>> unrealistic, since metadata updates are supposed to be small and the
>> performance of ZooKeeper should be good enough for your case, but it might
>> be awkward to have your block storage clients talking directly to ZooKeeper.
>> Changes to metadata management would imply in this case rolling out a new
>> version of the client application instead of just having the changes
>> implemented on the service side.
>> -Flavio
>> On Jul 13, 2011, at 12:02 PM, Simon Felix wrote:
>> Hello everyone
>> What is the best way to build a distributed, shared storage system on top
>> of
>> ZooKeeper? I'm talking about block storage in the terabyte-range (i.e.
>> store
>> billions of 4k blocks). Consistency and Availability are important, as is
>> throughput (both read & write). I need at least 50 MB/s with 3 nodes with
>> two regular SATA drives each for my application.
>> Some options I came up with:
>> 1. Use ZooKeeper directly as a data store (Not recommended according to
>> the
>> docs - and it really leads to abysmally bad performance, I tested that)
>> 2. Use Cassandra as data store
>> 3. Use BookKeeper as write-ahead log and implement my own underlying store
>> 4. Use ZooKeeper to create my own (probably buggy...) data store
>> What would you recommend? Are there other options?
>> Cheers,
>> Simon
>>   *flavio*
>> *junqueira*
>> research scientist
>> fpj@yahoo-inc.com
>> direct +34 93-183-8828
>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>> phone (408) 349 3300    fax (408) 349 3301

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message