incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Dionne <dio...@dionne-associates.com>
Subject Re: How fast do CouchDB propagate changes to other nodes?
Date Sat, 18 Dec 2010 12:00:48 GMT



On Dec 17, 2010, at 6:07 PM, Randall Leeds wrote:

> How fast:
> How fast is almost meaningless to ask, since it depends a lot on
> what's between CouchDB and your chat clients.
> 
> After a change is written to the database, the internal change

I've never built a chat server but I would imagine one wouldn't want to write to the disk
first, getting the message out would have higher priority especially if logging
wasn't required.

> listener will get the update almost immediately. From there it's
> pushed down the ?feed=continuous long poll to a _changes consumer
> (e.g. another couch doing pull replication or a client) or two http
> requests (usually with keep-alive) to a push replication destination
> CouchDB.
> 
> For *one* pull replication or _changes hop it is (at best, feed is
> up-to-date, consumer is waiting for the next entry from the server)
> the time for the producer (couchdb) and consumer (couchdb, client,
> etc) to (de)-serialize and send/receive one line of JSON text. Nothing
> more. This can be really fast.
> 
> Should you use CouchDB?
> 
> Let's assume this project gets interesting and you need multiple nodes
> like you described. You could partition your clients between CouchDB
> nodes using a consistent hash on a normalized name of the user to
> divide up the resources of a cluster. You would then filter the
> replications such that each Couch only receives messages intended for
> its connected users.
> 
> The biggest hurdle here is checkpointing. Since replication needs to
> know where to begin if it's restarted, you need to create a
> replication topology or strategy that is both resilient to network
> outages and doesn't require checking the entire chat history of
> everything should you need to change your replication pattern (in
> response to failure, scaling, reconfiguration, etc).
> 
> If I were doing it this way I would maybe keep and "inbox" and
> "outbox" database on every node. You could even name outbox something
> like "ramdisk/outbox" and mount a RAM disk as "ramdisk" in the CouchDB
> storage directory so that "outbox.couch" gets stored in there. When
> your clients send messages you could store them in the outbox and
> trust that when they arrive at the right "inbox" on some CouchDB they
> will be persisted there. You could even round robin through many
> outboxes, or have one per hour or so. This keeps your storage down and
> opens up the interesting replication patterns for pushing messages
> through a redundantly connected graph of Couches without building up a
> massive database that will be hard to replicate (except the inboxes at
> the edges).
> 
> Using CouchDB for a chat server is an interesting idea, but I don't
> know of anyone using CouchDB for replication that is this 'gossipy'. I
> think BigCouch might do some every-to-every node replication for

BigCouch has an internal layer called rexi [1], that improves on erlang's built in rex, designed
for the case when you want to spawn lots of remote processes. You might find it useful, it
was built to be independent of BigCouch. 


[1] https://github.com/cloudant/rexi



> keeping cluster information and database metadata up to date around
> the cluster, but that information tends to be small and changes
> infrequently.
> 
> However, to me this sounds like a lot of work for something that might
> be better solved using technologies like zeromq, particularly if
> logging all messages is optional.
> 
> Anyway, I'm happy to talk about all of this further since I think it's
> kind of fascinating. I've been thinking a lot recently about how flood

I'm curious, is flood replication what the name implies? Broadcasting?



> replication could function efficiently in a dynamic environment, but
> it's mostly open questions right now.
> 
> I hope that provides some direction and thought guidance. Please let
> me know if anything didn't make sense or you have other interesting
> ideas or questions. I think it could be made to work, but it's not a

I agree that chat is an interesting use case and I suspect it would push on couchdb in a number
of ways. I'd be a stretch to call it real-time. There's a proof of concept chat demo [2] that
was done some time ago that might be worth looking at, though I'm not sure it's been developed
further. There's also ejabberd[3], which I haven't used in a while but looked interesting
when it was first out.

[2] https://github.com/jchris/toast
[3] http://www.process-one.net/en/ejabberd/

> natural fit at scale for the existing replication model at this time.
> 
> Cheers,
> Randall
> 
> On Fri, Dec 17, 2010 at 13:57, Johnny Weng Luu
> <johnny.weng.luu@gmail.com> wrote:
>> Hi
>> 
>> Im designing a chat app and i thought about this design:
>> 
>> Clients are connected to the nearest couchdb and listening for changes (chat
>> texts).
>> If one client posts a new message it will be inserted in that client's
>> couchdb node.
>> The change will be propagated to other couchdb nodes in the cluster.
>> The clients connected to those couchdb nodes will get that message.
>> 
>> But this design is heavily dependent on how fast couchdb propagates changes
>> to other nodes.
>> Is this a good design with couchdb or is it not intended for this design?
>> 
>> How else could you design a chat application with couchdb?
>> 
>> /Johnny
>> 


Mime
View raw message