couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <>
Subject Re: CouchDB and Hadoop_
Date Mon, 19 Apr 2010 14:39:10 GMT
Thanks Fredrik.  I think I have a pretty good handle on what's happening and have replied in
detail in JIRA.  Best,


On Apr 19, 2010, at 10:22 AM, Fredrik Widlund wrote:

> Hi,
> Thanks,
> Fredrik
> -----Original Message-----
> From: Adam Kocoloski []
> Sent: den 19 april 2010 16:05
> To:
> Subject: Re: CouchDB and Hadoop_
> Hi Fredrik, thanks for the details.  The CPU utilization does not sound normal at all.
 I have a node replicating 30-75 updates/sec (unique documents, diurnal fluctuations) for
several months now and it almost never uses more than 50% of one core of a virtualized e5410
box with 1.7G of RAM.
> I would definitely look into the crashes first and see if that resolves the giant fluctuations
in CPU.  Is there a JIRA ticket I can follow? (I'm one of the developers of the replicator).
> Adam
> On Apr 19, 2010, at 4:07 AM, Fredrik Widlund wrote:
>> Hi,
>> The case I've tested so far is using couch in the following setup (which is a small
part of what would be a production level setup for us)
>> - two bidirectionally synced nodes
>> - <50 writes/s to node A, each updating a unique doc
>> - <50 writes/s to node B, each updating a unique doc
>> - <50 reads/s from each node
>> - regular compacting the database containing the docs
>> The two nodes run on quad (e5520) cpu with 16G ram. CPU ramp down and up to 400%
(i.e. full load on all cores) every few seconds. Couch 0.11.0 crashes regularly, which has
been reported and is being worked on from what I understand. Also, the replications tasks
breaks and has to be restarted very often, probably due to the problem above.
>> Now, I've received a temporary patch as a possible work-around for the crashes and
I haven't tested this case with the work-around yet, but I would assume this hopefully sorts
out the crashes, but not the cpu load.
>> Kind regards,
>> Fredrik Widlund
>> -----Original Message-----
>> From: Randall Leeds []
>> Sent: den 16 april 2010 21:06
>> To:
>> Subject: Re: CouchDB and Hadoop_
>> Hey Fredrik,
>> I'm one of the couchdb-lounge developers. I'd like to understand
>> better what your performance concerns are. Why are you concerned about
>> replicating a large number of changes? A distributed file system would
>> be doing the same thing but at a lower level. If such a system were to
>> work you'd be saving only HTTP and JSON overhead vs replication. If
>> the replicator is too slow, that is something that can possibly be
>> improved. If you're concerned about the runtime impact of replication
>> this is tunable via the [replicator] configuration section.
>> couchdb-lounge already uses nginx for distributing simple GET and PUT
>> operations to documents and a python-twisted daemon to handle views.
>> The twisted daemon has configurable caching (with the one caveat that
>> the cache is currently unbounded, so the daemon needs to be restarted
>> periodically.... I should really fix this :-P). It should be possible
>> to chain any standard nginx caching modules in front of the lounge
>> proxy module.
>> If you have other concerns or would like to investigate more, ping me
>> on irc (tilgovi) or join us over on
>> -Randall
>> On Fri, Apr 16, 2010 at 09:54, Fredrik Widlund
>> <> wrote:
>>> Thanks, I will! We will actually use nginx for "dumb" caching, but add an api
layer in between the cache and the couch. Also we actually need to mirror data to provide
HA, and the performance issues we're having are more about constantly replicating a large
number of changes than accelerating the reads. I'm not sure if couchdb-lounge would address
>>> We did stumble upon a bug that's being addressed and we we're also provided with
a temporary work-around and it could be due to that, but with a quite modest load we periodically
kept hitting the roof of a e5520 quad-core so I'm a bit worried about the performance aspect.
>>> Kind regards,
>>> Fredrik Widlund
>>> -----Ursprungligt meddelande-----
>>> Från: David Coallier []
>>> Skickat: den 16 april 2010 18:06
>>> Till:
>>> Ämne: Re: CouchDB and Hadoop_
>>> On 16 April 2010 16:22, Fredrik Widlund <> wrote:
>>>> Well, we're building a solution on Couch and replication on a relatively
large scale and saying "it just works" doesn't really describe it for us. I really like the
Couch design but it's a bit of a challenge making it work, for us. I can describe the case
if you like.
>>>> Also we already have a decentralized distributed file system layer (which
often is a natural part of a cloud solution I suppose) so if we could run it on top of that
it would lessen the complexity of the overall solution.
>>>> In any case it was a quick comment to the Hadoop question, and maybe it just
wouldn't work that way. You could in general discuss atomic operations/locking and performance
implications by moving synchronization to a lower abstraction layer I guess.
>>> <snip>
>>> You should look into couchdb-lounge . It should resolve most of your
>>> "sharding" replication issues :)
>>> --
>>> David Coallier

View raw message