Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DD91F9F7D for ; Thu, 14 Jun 2012 14:47:07 +0000 (UTC) Received: (qmail 66457 invoked by uid 500); 14 Jun 2012 14:47:05 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 66412 invoked by uid 500); 14 Jun 2012 14:47:05 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 66369 invoked by uid 99); 14 Jun 2012 14:47:05 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jun 2012 14:47:05 +0000 Received: from localhost (HELO mail-yw0-f52.google.com) (127.0.0.1) (smtp-auth username rnewson, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jun 2012 14:47:05 +0000 Received: by yhpp61 with SMTP id p61so1664895yhp.11 for ; Thu, 14 Jun 2012 07:47:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.208.100 with SMTP id md4mr952393igc.65.1339685223547; Thu, 14 Jun 2012 07:47:03 -0700 (PDT) Received: by 10.42.106.199 with HTTP; Thu, 14 Jun 2012 07:47:03 -0700 (PDT) In-Reply-To: References: Date: Thu, 14 Jun 2012 15:47:03 +0100 Message-ID: Subject: Re: Compaction Best Practices From: Robert Newson To: user@couchdb.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable The scheme I suggest avoids compaction entirely, which I thought was your main struggle. You still need to delete the documents in the old database so that you can detect when it's safe to delete it. When it's empty, -X DELETE it. A database delete is a simple 'rm' of the file, taking very little time. You can ignore the revs_limit suggestions since you don't update the documents. And you should ignore it even if you do, there's almost no legitimate case for altering that setting. B. On 14 June 2012 15:21, Tim Tisdall wrote: > The deleting doesn't take too much time, it's the compaction process, > right? =A0If you have a different DB for each day, then you could > compact previous days without affecting writing to the current day. > Also, once you've completely deleted all the records from a previous > days set of logs, you could then proceed to just delete that day's > database instead of compacting it. > > > On Thu, Jun 14, 2012 at 9:30 AM, Nicolas Peeters wr= ote: >> A few more hints, after investigation with the team. >> 1. We can't really have rotating DBs as sometimes we want to keep older >> transaction records in the DB for a longer time. >> 2. We never replicate nor update the statements (so the _rev_limit won't >> really change much (or will it for the compaction??)) >> >> On Thu, Jun 14, 2012 at 3:14 PM, Nicolas Peeters wr= ote: >> >>> Actually we never modify those records. Just query them up in certain >>> cases. >>> >>> Regarding Robert's suggestion, I was indeed confused because he was >>> suggesting to delete them one by one. >>> >>> I need to read about the "lower_revs_limit". We never replicate this da= ta. >>> >>> >>> On Thu, Jun 14, 2012 at 3:08 PM, Tim Tisdall wrote: >>> >>>> I think he's suggesting avoiding compaction completely. =A0Just delete >>>> the old DB when you've finished deleting all the records. >>>> >>>> On Thu, Jun 14, 2012 at 9:05 AM, Nicolas Peeters >>>> wrote: >>>> > Interesting suggestion. However, this would perhaps have the same ef= fect >>>> > (deleting/compacting the old DB is what makes the system slower)...? >>>> > >>>> > On Thu, Jun 14, 2012 at 2:54 PM, Robert Newson >>>> wrote: >>>> > >>>> >> Do you eventually delete every document you add? >>>> >> >>>> >> If so, consider using a rolling database scheme instead. At some >>>> >> point, perhaps daily, start a new database and write new transactio= n >>>> >> logs there. Continue deleting old logs from the previous database(s= ) >>>> >> until they're empty (doc_count:0) and then delete the database. >>>> >> >>>> >> B. >>>> >> >>>> >> On 14 June 2012 13:44, Nicolas Peeters wrote: >>>> >> > I'd like some advice from the community regarding compaction. >>>> >> > >>>> >> > *Scenario:* >>>> >> > >>>> >> > We have a large-ish CouchDB database that is being used for >>>> transactional >>>> >> > logs (very write heavy). Once in a while, we delete some of the >>>> records >>>> >> in >>>> >> > large batches and we have scheduled compaction (not automatic (ye= t)) >>>> >> every >>>> >> > 12hours. >>>> >> > >>>> >> > From what I can see, the DB is being hammered significantly every= 12 >>>> >> hours >>>> >> > and the compaction is taking 4 hours (with a size of 50-100GB of = log >>>> >> data). >>>> >> > >>>> >> > *The problem:* >>>> >> > >>>> >> > The problem is that compaction takes a very long time and reduces= the >>>> >> > performance of the stack. It seems that it's hard for the compact= ion >>>> >> > process to "keep up" with the insertions, hence why it takes so l= ong. >>>> >> Also, >>>> >> > what I'm not sure is how "incremental" the compaction is... >>>> >> > >>>> >> > =A0 1. In this case, would it make sense to run the compaction mo= re >>>> often >>>> >> > =A0 (every 10 minutes); since we're write-heavy. >>>> >> > =A0 =A0 =A01. Should we just run more often? (so hopefully it doe= sn't do >>>> >> > =A0 =A0 =A0unnecessary work too often). Actually, in our case, we= should >>>> >> probably >>>> >> > =A0 =A0 =A0never have automatic compaction if there has been no >>>> "termination". >>>> >> > =A0 =A0 =A02. Or actually only once in a while? (bigger batch, bu= t less >>>> >> > =A0 =A0 =A0"useless" overhead) >>>> >> > =A0 =A0 =A03. Or should we just wait that a given size (which is = the >>>> problem >>>> >> > =A0 =A0 =A0really) is hit and use the auto compaction (in CouchDB= 1.2.0) >>>> for >>>> >> this? >>>> >> > =A0 2. In CouchDB 1.2.0 there's a new feature: auto >>>> >> > compaction< >>>> >> http://wiki.apache.org/couchdb/Compaction#Automatic_Compaction> >>>> >> > which >>>> >> > =A0 may be useful for us. There's the "strict_window" feature to = give >>>> a max >>>> >> > =A0 amount of time to compact and cancel the compaction after tha= t (in >>>> >> order >>>> >> > =A0 not to have it running for 4h+=85). I'm wondering what the im= pact of >>>> >> that is >>>> >> > =A0 on the long run. What if the compaction cannot be completed i= n that >>>> >> window? >>>> >> > >>>> >> > Thanks a lot! >>>> >> > >>>> >> > Nicolas >>>> >> >>>> >>> >>>