incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Tisdall <tisd...@gmail.com>
Subject Re: Compaction Best Practices
Date Thu, 14 Jun 2012 14:21:46 GMT
The deleting doesn't take too much time, it's the compaction process,
right?  If you have a different DB for each day, then you could
compact previous days without affecting writing to the current day.
Also, once you've completely deleted all the records from a previous
days set of logs, you could then proceed to just delete that day's
database instead of compacting it.


On Thu, Jun 14, 2012 at 9:30 AM, Nicolas Peeters <nicolists@gmail.com> wrote:
> A few more hints, after investigation with the team.
> 1. We can't really have rotating DBs as sometimes we want to keep older
> transaction records in the DB for a longer time.
> 2. We never replicate nor update the statements (so the _rev_limit won't
> really change much (or will it for the compaction??))
>
> On Thu, Jun 14, 2012 at 3:14 PM, Nicolas Peeters <nicolists@gmail.com>wrote:
>
>> Actually we never modify those records. Just query them up in certain
>> cases.
>>
>> Regarding Robert's suggestion, I was indeed confused because he was
>> suggesting to delete them one by one.
>>
>> I need to read about the "lower_revs_limit". We never replicate this data.
>>
>>
>> On Thu, Jun 14, 2012 at 3:08 PM, Tim Tisdall <tisdall@gmail.com> wrote:
>>
>>> I think he's suggesting avoiding compaction completely.  Just delete
>>> the old DB when you've finished deleting all the records.
>>>
>>> On Thu, Jun 14, 2012 at 9:05 AM, Nicolas Peeters <nicolists@gmail.com>
>>> wrote:
>>> > Interesting suggestion. However, this would perhaps have the same effect
>>> > (deleting/compacting the old DB is what makes the system slower)...?
>>> >
>>> > On Thu, Jun 14, 2012 at 2:54 PM, Robert Newson <rnewson@apache.org>
>>> wrote:
>>> >
>>> >> Do you eventually delete every document you add?
>>> >>
>>> >> If so, consider using a rolling database scheme instead. At some
>>> >> point, perhaps daily, start a new database and write new transaction
>>> >> logs there. Continue deleting old logs from the previous database(s)
>>> >> until they're empty (doc_count:0) and then delete the database.
>>> >>
>>> >> B.
>>> >>
>>> >> On 14 June 2012 13:44, Nicolas Peeters <nicolists@gmail.com> wrote:
>>> >> > I'd like some advice from the community regarding compaction.
>>> >> >
>>> >> > *Scenario:*
>>> >> >
>>> >> > We have a large-ish CouchDB database that is being used for
>>> transactional
>>> >> > logs (very write heavy). Once in a while, we delete some of the
>>> records
>>> >> in
>>> >> > large batches and we have scheduled compaction (not automatic (yet))
>>> >> every
>>> >> > 12hours.
>>> >> >
>>> >> > From what I can see, the DB is being hammered significantly every
12
>>> >> hours
>>> >> > and the compaction is taking 4 hours (with a size of 50-100GB of
log
>>> >> data).
>>> >> >
>>> >> > *The problem:*
>>> >> >
>>> >> > The problem is that compaction takes a very long time and reduces
the
>>> >> > performance of the stack. It seems that it's hard for the compaction
>>> >> > process to "keep up" with the insertions, hence why it takes so
long.
>>> >> Also,
>>> >> > what I'm not sure is how "incremental" the compaction is...
>>> >> >
>>> >> >   1. In this case, would it make sense to run the compaction more
>>> often
>>> >> >   (every 10 minutes); since we're write-heavy.
>>> >> >      1. Should we just run more often? (so hopefully it doesn't
do
>>> >> >      unnecessary work too often). Actually, in our case, we
should
>>> >> probably
>>> >> >      never have automatic compaction if there has been no
>>> "termination".
>>> >> >      2. Or actually only once in a while? (bigger batch, but
less
>>> >> >      "useless" overhead)
>>> >> >      3. Or should we just wait that a given size (which is the
>>> problem
>>> >> >      really) is hit and use the auto compaction (in CouchDB
1.2.0)
>>> for
>>> >> this?
>>> >> >   2. In CouchDB 1.2.0 there's a new feature: auto
>>> >> > compaction<
>>> >> http://wiki.apache.org/couchdb/Compaction#Automatic_Compaction>
>>> >> > which
>>> >> >   may be useful for us. There's the "strict_window" feature to
give
>>> a max
>>> >> >   amount of time to compact and cancel the compaction after that
(in
>>> >> order
>>> >> >   not to have it running for 4h+…). I'm wondering what the impact
of
>>> >> that is
>>> >> >   on the long run. What if the compaction cannot be completed
in that
>>> >> window?
>>> >> >
>>> >> > Thanks a lot!
>>> >> >
>>> >> > Nicolas
>>> >>
>>>
>>
>>

Mime
View raw message