incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: Compaction Best Practices
Date Thu, 14 Jun 2012 15:25:22 GMT
If there's a quiet period in your day/night cycle (there often isn't),
I'd definitely schedule one then. however, it sounds like you can't go
that long between them, so I'd try once an hour and see how it goes.

You can now compare the disk_size and data_size of your database to
get an accurate measure of how much disk space you'll recover by doing
so, so perhaps trigger on that instead. I think the auto-compactor can
trigger on that basis but I haven't used it (on Cloudant we've had
this automated for a long time, so it's not something I've ever needed
to look for).

B.

On 14 June 2012 16:17, Nicolas Peeters <peetersn@gmail.com> wrote:
> Totally agree that this is not the best use case for CouchDB. We're looking
> at other options for the very near future. However, now we still have this
> issue that we need to cope with.
>
> So, if you don't mind, back to my original question, if I wanted to use
> compaction or auto-compaction (as in 1.2.0). What would be the best
> schedule? Trigger it a lot, or trigger it as less as possible (while still
> making sure I have enough disk). And what if I use the strict_window?
>
> On Thu, Jun 14, 2012 at 4:49 PM, Robert Newson <rnewson@apache.org> wrote:
>
>> Final note: couchdb is a database. Databases often make poor
>> transaction logs (though they often have their own transaction logs,
>> in a highly optimized format designed for that purpose), especially
>> ones like couchdb which preserve a tombstone of every document ever
>> seen forever. My suggestion above is really a coping mechanism for
>> using the wrong tool.
>>
>> B.
>>
>> On 14 June 2012 15:47, Robert Newson <rnewson@apache.org> wrote:
>> > The scheme I suggest avoids compaction entirely, which I thought was
>> > your main struggle.
>> >
>> > You still need to delete the documents in the old database so that you
>> > can detect when it's safe to delete it. When it's empty, -X DELETE it.
>> > A database delete is a simple 'rm' of the file, taking very little
>> > time.
>> >
>> > You can ignore the revs_limit suggestions since you don't update the
>> > documents. And you should ignore it even if you do, there's almost no
>> > legitimate case for altering that setting.
>> >
>> > B.
>> >
>> > On 14 June 2012 15:21, Tim Tisdall <tisdall@gmail.com> wrote:
>> >> The deleting doesn't take too much time, it's the compaction process,
>> >> right?  If you have a different DB for each day, then you could
>> >> compact previous days without affecting writing to the current day.
>> >> Also, once you've completely deleted all the records from a previous
>> >> days set of logs, you could then proceed to just delete that day's
>> >> database instead of compacting it.
>> >>
>> >>
>> >> On Thu, Jun 14, 2012 at 9:30 AM, Nicolas Peeters <nicolists@gmail.com>
>> wrote:
>> >>> A few more hints, after investigation with the team.
>> >>> 1. We can't really have rotating DBs as sometimes we want to keep older
>> >>> transaction records in the DB for a longer time.
>> >>> 2. We never replicate nor update the statements (so the _rev_limit
>> won't
>> >>> really change much (or will it for the compaction??))
>> >>>
>> >>> On Thu, Jun 14, 2012 at 3:14 PM, Nicolas Peeters <nicolists@gmail.com
>> >wrote:
>> >>>
>> >>>> Actually we never modify those records. Just query them up in certain
>> >>>> cases.
>> >>>>
>> >>>> Regarding Robert's suggestion, I was indeed confused because he
was
>> >>>> suggesting to delete them one by one.
>> >>>>
>> >>>> I need to read about the "lower_revs_limit". We never replicate
this
>> data.
>> >>>>
>> >>>>
>> >>>> On Thu, Jun 14, 2012 at 3:08 PM, Tim Tisdall <tisdall@gmail.com>
>> wrote:
>> >>>>
>> >>>>> I think he's suggesting avoiding compaction completely.  Just
delete
>> >>>>> the old DB when you've finished deleting all the records.
>> >>>>>
>> >>>>> On Thu, Jun 14, 2012 at 9:05 AM, Nicolas Peeters <
>> nicolists@gmail.com>
>> >>>>> wrote:
>> >>>>> > Interesting suggestion. However, this would perhaps have
the same
>> effect
>> >>>>> > (deleting/compacting the old DB is what makes the system
>> slower)...?
>> >>>>> >
>> >>>>> > On Thu, Jun 14, 2012 at 2:54 PM, Robert Newson <rnewson@apache.org
>> >
>> >>>>> wrote:
>> >>>>> >
>> >>>>> >> Do you eventually delete every document you add?
>> >>>>> >>
>> >>>>> >> If so, consider using a rolling database scheme instead.
At some
>> >>>>> >> point, perhaps daily, start a new database and write
new
>> transaction
>> >>>>> >> logs there. Continue deleting old logs from the previous
>> database(s)
>> >>>>> >> until they're empty (doc_count:0) and then delete the
database.
>> >>>>> >>
>> >>>>> >> B.
>> >>>>> >>
>> >>>>> >> On 14 June 2012 13:44, Nicolas Peeters <nicolists@gmail.com>
>> wrote:
>> >>>>> >> > I'd like some advice from the community regarding
compaction.
>> >>>>> >> >
>> >>>>> >> > *Scenario:*
>> >>>>> >> >
>> >>>>> >> > We have a large-ish CouchDB database that is being
used for
>> >>>>> transactional
>> >>>>> >> > logs (very write heavy). Once in a while, we delete
some of the
>> >>>>> records
>> >>>>> >> in
>> >>>>> >> > large batches and we have scheduled compaction
(not automatic
>> (yet))
>> >>>>> >> every
>> >>>>> >> > 12hours.
>> >>>>> >> >
>> >>>>> >> > From what I can see, the DB is being hammered
significantly
>> every 12
>> >>>>> >> hours
>> >>>>> >> > and the compaction is taking 4 hours (with a size
of 50-100GB
>> of log
>> >>>>> >> data).
>> >>>>> >> >
>> >>>>> >> > *The problem:*
>> >>>>> >> >
>> >>>>> >> > The problem is that compaction takes a very long
time and
>> reduces the
>> >>>>> >> > performance of the stack. It seems that it's hard
for the
>> compaction
>> >>>>> >> > process to "keep up" with the insertions, hence
why it takes so
>> long.
>> >>>>> >> Also,
>> >>>>> >> > what I'm not sure is how "incremental" the compaction
is...
>> >>>>> >> >
>> >>>>> >> >   1. In this case, would it make sense to run
the compaction
>> more
>> >>>>> often
>> >>>>> >> >   (every 10 minutes); since we're write-heavy.
>> >>>>> >> >      1. Should we just run more often? (so
hopefully it doesn't
>> do
>> >>>>> >> >      unnecessary work too often). Actually,
in our case, we
>> should
>> >>>>> >> probably
>> >>>>> >> >      never have automatic compaction if there
has been no
>> >>>>> "termination".
>> >>>>> >> >      2. Or actually only once in a while? (bigger
batch, but
>> less
>> >>>>> >> >      "useless" overhead)
>> >>>>> >> >      3. Or should we just wait that a given
size (which is the
>> >>>>> problem
>> >>>>> >> >      really) is hit and use the auto compaction
(in CouchDB
>> 1.2.0)
>> >>>>> for
>> >>>>> >> this?
>> >>>>> >> >   2. In CouchDB 1.2.0 there's a new feature:
auto
>> >>>>> >> > compaction<
>> >>>>> >> http://wiki.apache.org/couchdb/Compaction#Automatic_Compaction>
>> >>>>> >> > which
>> >>>>> >> >   may be useful for us. There's the "strict_window"
feature to
>> give
>> >>>>> a max
>> >>>>> >> >   amount of time to compact and cancel the compaction
after
>> that (in
>> >>>>> >> order
>> >>>>> >> >   not to have it running for 4h+…). I'm wondering
what the
>> impact of
>> >>>>> >> that is
>> >>>>> >> >   on the long run. What if the compaction cannot
be completed
>> in that
>> >>>>> >> window?
>> >>>>> >> >
>> >>>>> >> > Thanks a lot!
>> >>>>> >> >
>> >>>>> >> > Nicolas
>> >>>>> >>
>> >>>>>
>> >>>>
>> >>>>
>>

Mime
View raw message