couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: Compaction Best Practices
Date Thu, 14 Jun 2012 14:49:07 GMT
Final note: couchdb is a database. Databases often make poor
transaction logs (though they often have their own transaction logs,
in a highly optimized format designed for that purpose), especially
ones like couchdb which preserve a tombstone of every document ever
seen forever. My suggestion above is really a coping mechanism for
using the wrong tool.

B.

On 14 June 2012 15:47, Robert Newson <rnewson@apache.org> wrote:
> The scheme I suggest avoids compaction entirely, which I thought was
> your main struggle.
>
> You still need to delete the documents in the old database so that you
> can detect when it's safe to delete it. When it's empty, -X DELETE it.
> A database delete is a simple 'rm' of the file, taking very little
> time.
>
> You can ignore the revs_limit suggestions since you don't update the
> documents. And you should ignore it even if you do, there's almost no
> legitimate case for altering that setting.
>
> B.
>
> On 14 June 2012 15:21, Tim Tisdall <tisdall@gmail.com> wrote:
>> The deleting doesn't take too much time, it's the compaction process,
>> right?  If you have a different DB for each day, then you could
>> compact previous days without affecting writing to the current day.
>> Also, once you've completely deleted all the records from a previous
>> days set of logs, you could then proceed to just delete that day's
>> database instead of compacting it.
>>
>>
>> On Thu, Jun 14, 2012 at 9:30 AM, Nicolas Peeters <nicolists@gmail.com> wrote:
>>> A few more hints, after investigation with the team.
>>> 1. We can't really have rotating DBs as sometimes we want to keep older
>>> transaction records in the DB for a longer time.
>>> 2. We never replicate nor update the statements (so the _rev_limit won't
>>> really change much (or will it for the compaction??))
>>>
>>> On Thu, Jun 14, 2012 at 3:14 PM, Nicolas Peeters <nicolists@gmail.com>wrote:
>>>
>>>> Actually we never modify those records. Just query them up in certain
>>>> cases.
>>>>
>>>> Regarding Robert's suggestion, I was indeed confused because he was
>>>> suggesting to delete them one by one.
>>>>
>>>> I need to read about the "lower_revs_limit". We never replicate this data.
>>>>
>>>>
>>>> On Thu, Jun 14, 2012 at 3:08 PM, Tim Tisdall <tisdall@gmail.com> wrote:
>>>>
>>>>> I think he's suggesting avoiding compaction completely.  Just delete
>>>>> the old DB when you've finished deleting all the records.
>>>>>
>>>>> On Thu, Jun 14, 2012 at 9:05 AM, Nicolas Peeters <nicolists@gmail.com>
>>>>> wrote:
>>>>> > Interesting suggestion. However, this would perhaps have the same
effect
>>>>> > (deleting/compacting the old DB is what makes the system slower)...?
>>>>> >
>>>>> > On Thu, Jun 14, 2012 at 2:54 PM, Robert Newson <rnewson@apache.org>
>>>>> wrote:
>>>>> >
>>>>> >> Do you eventually delete every document you add?
>>>>> >>
>>>>> >> If so, consider using a rolling database scheme instead. At
some
>>>>> >> point, perhaps daily, start a new database and write new transaction
>>>>> >> logs there. Continue deleting old logs from the previous database(s)
>>>>> >> until they're empty (doc_count:0) and then delete the database.
>>>>> >>
>>>>> >> B.
>>>>> >>
>>>>> >> On 14 June 2012 13:44, Nicolas Peeters <nicolists@gmail.com>
wrote:
>>>>> >> > I'd like some advice from the community regarding compaction.
>>>>> >> >
>>>>> >> > *Scenario:*
>>>>> >> >
>>>>> >> > We have a large-ish CouchDB database that is being used
for
>>>>> transactional
>>>>> >> > logs (very write heavy). Once in a while, we delete some
of the
>>>>> records
>>>>> >> in
>>>>> >> > large batches and we have scheduled compaction (not automatic
(yet))
>>>>> >> every
>>>>> >> > 12hours.
>>>>> >> >
>>>>> >> > From what I can see, the DB is being hammered significantly
every 12
>>>>> >> hours
>>>>> >> > and the compaction is taking 4 hours (with a size of 50-100GB
of log
>>>>> >> data).
>>>>> >> >
>>>>> >> > *The problem:*
>>>>> >> >
>>>>> >> > The problem is that compaction takes a very long time and
reduces the
>>>>> >> > performance of the stack. It seems that it's hard for the
compaction
>>>>> >> > process to "keep up" with the insertions, hence why it
takes so long.
>>>>> >> Also,
>>>>> >> > what I'm not sure is how "incremental" the compaction is...
>>>>> >> >
>>>>> >> >   1. In this case, would it make sense to run the compaction
more
>>>>> often
>>>>> >> >   (every 10 minutes); since we're write-heavy.
>>>>> >> >      1. Should we just run more often? (so hopefully
it doesn't do
>>>>> >> >      unnecessary work too often). Actually, in our case,
we should
>>>>> >> probably
>>>>> >> >      never have automatic compaction if there has been
no
>>>>> "termination".
>>>>> >> >      2. Or actually only once in a while? (bigger batch,
but less
>>>>> >> >      "useless" overhead)
>>>>> >> >      3. Or should we just wait that a given size (which
is the
>>>>> problem
>>>>> >> >      really) is hit and use the auto compaction (in
CouchDB 1.2.0)
>>>>> for
>>>>> >> this?
>>>>> >> >   2. In CouchDB 1.2.0 there's a new feature: auto
>>>>> >> > compaction<
>>>>> >> http://wiki.apache.org/couchdb/Compaction#Automatic_Compaction>
>>>>> >> > which
>>>>> >> >   may be useful for us. There's the "strict_window" feature
to give
>>>>> a max
>>>>> >> >   amount of time to compact and cancel the compaction
after that (in
>>>>> >> order
>>>>> >> >   not to have it running for 4h+…). I'm wondering what
the impact of
>>>>> >> that is
>>>>> >> >   on the long run. What if the compaction cannot be completed
in that
>>>>> >> window?
>>>>> >> >
>>>>> >> > Thanks a lot!
>>>>> >> >
>>>>> >> > Nicolas
>>>>> >>
>>>>>
>>>>
>>>>

Mime
View raw message