couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fredrik Widlund <>
Subject RE: CouchDB and Hadoop_
Date Mon, 19 Apr 2010 08:13:14 GMT

Yes, a distributed file system would just sync on a lower level. I'm not proposing this, I
was just commenting on the hadoop thread. For me though, it would actually be relevant to
at least consider using file system synchronization, if it against all odds would work. I'll
check out couchdb-lounge more in detail for sure.

Kind regards,
Fredrik Widlund

-----Original Message-----
From: Randall Leeds []
Sent: den 16 april 2010 21:06
Subject: Re: CouchDB and Hadoop_

Hey Fredrik,

I'm one of the couchdb-lounge developers. I'd like to understand
better what your performance concerns are. Why are you concerned about
replicating a large number of changes? A distributed file system would
be doing the same thing but at a lower level. If such a system were to
work you'd be saving only HTTP and JSON overhead vs replication. If
the replicator is too slow, that is something that can possibly be
improved. If you're concerned about the runtime impact of replication
this is tunable via the [replicator] configuration section.

couchdb-lounge already uses nginx for distributing simple GET and PUT
operations to documents and a python-twisted daemon to handle views.
The twisted daemon has configurable caching (with the one caveat that
the cache is currently unbounded, so the daemon needs to be restarted
periodically.... I should really fix this :-P). It should be possible
to chain any standard nginx caching modules in front of the lounge
proxy module.

If you have other concerns or would like to investigate more, ping me
on irc (tilgovi) or join us over on


On Fri, Apr 16, 2010 at 09:54, Fredrik Widlund
<> wrote:
> Thanks, I will! We will actually use nginx for "dumb" caching, but add an api layer in
between the cache and the couch. Also we actually need to mirror data to provide HA, and the
performance issues we're having are more about constantly replicating a large number of changes
than accelerating the reads. I'm not sure if couchdb-lounge would address this.
> We did stumble upon a bug that's being addressed and we we're also provided with a temporary
work-around and it could be due to that, but with a quite modest load we periodically kept
hitting the roof of a e5520 quad-core so I'm a bit worried about the performance aspect.
> Kind regards,
> Fredrik Widlund
> -----Ursprungligt meddelande-----
> Från: David Coallier []
> Skickat: den 16 april 2010 18:06
> Till:
> Ämne: Re: CouchDB and Hadoop_
> On 16 April 2010 16:22, Fredrik Widlund <> wrote:
>> Well, we're building a solution on Couch and replication on a relatively large scale
and saying "it just works" doesn't really describe it for us. I really like the Couch design
but it's a bit of a challenge making it work, for us. I can describe the case if you like.
>> Also we already have a decentralized distributed file system layer (which often is
a natural part of a cloud solution I suppose) so if we could run it on top of that it would
lessen the complexity of the overall solution.
>> In any case it was a quick comment to the Hadoop question, and maybe it just wouldn't
work that way. You could in general discuss atomic operations/locking and performance implications
by moving synchronization to a lower abstraction layer I guess.
> <snip>
> You should look into couchdb-lounge . It should resolve most of your
> "sharding" replication issues :)
> --
> David Coallier

View raw message