couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: why is couch stealing all my resources?
Date Tue, 22 May 2012 11:28:00 GMT
Hi Cory,

The simple answer is that compaction is an intense process (it reads
the latest versions of all documents and writes them to a new file)
and that your production system appears to have been spec'ed without
testing it with user load and compaction load.

Making compaction more manageable is a project goal. The 1.2 release
adds a compaction daemon which can schedule compaction during off-peak
hours. Of course that's a partial solution, since there is often no
such period. On Cloudant, we have enhanced CouchDB so that compaction
operations run at a lower priority than user queries. This enables us
to run compaction at any time without impacting performance. Some
variant of this work should land in CouchDB core.

The two recommendations in the previous response are a mixed bag. The
first is true but I'm sure you didn't expect any system to give more
than 100% of its ability. The recommendation to use multiple databases
where your application would naturally require one is a clumsy
workaround, I don't recommend it. If you have a problem that big, I
suggest you look at BigCouch which takes care of masking the majority
of the difficulties you would encounter doing it by hand.

B.

On 22 May 2012 10:54, CGS <cgsmcmlxxv@gmail.com> wrote:
> Hi Cory,
>
> There are few general points I would mention:
> 1. Try to not allow the traffic to increase more than your harddisk
> capabilities. That is, in case you notice your traffic is becoming too high
> for read/write capabilities of your harddisk, you may want to consider
> adding a gateway and more machines to process your data (or at least more
> HDD units for parallel processing of your data).
> 2. Try to avoid oversized databases. That means, it is preferred for you to
> create a new database every time your database reaches a certain size. This
> recommendation is based on parallel build of your views which, on multicore
> CPU's (or multiple CPU's), will use better your computing element(s). Also,
> the search, insertion and so on will perform better in the case of smaller
> databases.
> These points can be easily implemented by using shards in BigCouch, for
> example. If you prefer something more dedicated, you can set a gateway
> which should manage (by using round-robin, tree or whatever other
> algorithm) the users connections. But this last idea requires you to
> implement it.
>
> There are other recommendations as well, but I stop to these two because
> these are the two that users usually forget to take into account when they
> design their projects based on CouchDB.
>
> Of course, your problem may have some other sources, so, more info (OS, CPU
> and so on) will help in finding those sources and to add your problem to
> the common knowledge.
>
> CGS
>
> PS: As you can notice, I didn't give any numbers here. That's because
> everything depends on your systems.
>
> On Tue, May 22, 2012 at 8:34 AM, Cory Zue <czue@dimagi.com> wrote:
>
>> Hi folks,
>>
>> I have a production app deployed on couchdb and it's been going great for a
>> while. However recently it has started to hog lots of the CPU cycles on our
>> machine. Also, view rebuilds seem to be happening at a crawl (maybe just
>> because the machine is taxed).
>>
>> We're on couch 1.0.1. The database is about 12 GB. We recently ran
>> compaction for the first time. Site traffic has steadily increased
>> recently, but not to the point where I would expect it to have such a large
>> affect on performance.
>>
>> Is there any way to see why couch is taking so much resources? I had
>> thought to temporarily switch the logging to debug mode and see if there's
>> anything interesting there, but curious what other tools/techniques people
>> have used to profile couch.
>>
>> Any general advice on this problem would also be useful.
>>
>> thanks,
>> Cory
>>

Mime
View raw message