Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3CCFC9142 for ; Thu, 14 Jun 2012 15:30:10 +0000 (UTC) Received: (qmail 17136 invoked by uid 500); 14 Jun 2012 15:30:08 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 17098 invoked by uid 500); 14 Jun 2012 15:30:08 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 17089 invoked by uid 99); 14 Jun 2012 15:30:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jun 2012 15:30:08 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of peetersn@gmail.com designates 209.85.161.180 as permitted sender) Received: from [209.85.161.180] (HELO mail-gg0-f180.google.com) (209.85.161.180) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jun 2012 15:30:00 +0000 Received: by ggnf1 with SMTP id f1so1597286ggn.11 for ; Thu, 14 Jun 2012 08:29:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=S0sIbK705ckT2S+/WmnneTunWsSwqy9YHrXJDqKiA9M=; b=yqA/2ne/CZgrPF8VJOF5g6p+nUIgoHN8ieaNFp/MaI5Txzxs24OJ2JOSJLFK3Ei7zc nqKZDGYtHTNY1nkJaoZrzFJeEDb3m097UW2AUjsI4Gqme8WbCykvLQo740RcC0p/j7Uz Tk0MgRM2draOJROwc8D5XGPMn68RX45XVS0hV0t3yZYCvy8Xeh2RLCsN+X4ju3T4wfyv Hp0/Bq9wrjwTDG85ZVbCI8FIf9o7OQ1xl6eV0sSjU+8GXyxY2ypD4NwX7nd24cG4+U5Q IrlC5dVYI7U4HoPGu/Ca7qzro1JvCcUn6211KREpUbmELNGGBuNKf13BUtyCbUO++3Mg /hCA== Received: by 10.50.106.136 with SMTP id gu8mr13158233igb.23.1339687778914; Thu, 14 Jun 2012 08:29:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.184.142 with HTTP; Thu, 14 Jun 2012 08:29:18 -0700 (PDT) In-Reply-To: References: From: Nicolas Peeters Date: Thu, 14 Jun 2012 17:29:18 +0200 Message-ID: Subject: Re: Compaction Best Practices To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=e89a8f23579d2c12bd04c2705e02 --e89a8f23579d2c12bd04c2705e02 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Right now, we have one per 12h and the compaction itself takes 4h. There's no quiet period, unfortunately. Your trigger is a great idea. I need to see if it's possible with 1.2.0. On Thu, Jun 14, 2012 at 5:25 PM, Robert Newson wrote: > If there's a quiet period in your day/night cycle (there often isn't), > I'd definitely schedule one then. however, it sounds like you can't go > that long between them, so I'd try once an hour and see how it goes. > > You can now compare the disk_size and data_size of your database to > get an accurate measure of how much disk space you'll recover by doing > so, so perhaps trigger on that instead. I think the auto-compactor can > trigger on that basis but I haven't used it (on Cloudant we've had > this automated for a long time, so it's not something I've ever needed > to look for). > > B. > > On 14 June 2012 16:17, Nicolas Peeters wrote: > > Totally agree that this is not the best use case for CouchDB. We're > looking > > at other options for the very near future. However, now we still have > this > > issue that we need to cope with. > > > > So, if you don't mind, back to my original question, if I wanted to use > > compaction or auto-compaction (as in 1.2.0). What would be the best > > schedule? Trigger it a lot, or trigger it as less as possible (while > still > > making sure I have enough disk). And what if I use the strict_window? > > > > On Thu, Jun 14, 2012 at 4:49 PM, Robert Newson > wrote: > > > >> Final note: couchdb is a database. Databases often make poor > >> transaction logs (though they often have their own transaction logs, > >> in a highly optimized format designed for that purpose), especially > >> ones like couchdb which preserve a tombstone of every document ever > >> seen forever. My suggestion above is really a coping mechanism for > >> using the wrong tool. > >> > >> B. > >> > >> On 14 June 2012 15:47, Robert Newson wrote: > >> > The scheme I suggest avoids compaction entirely, which I thought was > >> > your main struggle. > >> > > >> > You still need to delete the documents in the old database so that y= ou > >> > can detect when it's safe to delete it. When it's empty, -X DELETE i= t. > >> > A database delete is a simple 'rm' of the file, taking very little > >> > time. > >> > > >> > You can ignore the revs_limit suggestions since you don't update the > >> > documents. And you should ignore it even if you do, there's almost n= o > >> > legitimate case for altering that setting. > >> > > >> > B. > >> > > >> > On 14 June 2012 15:21, Tim Tisdall wrote: > >> >> The deleting doesn't take too much time, it's the compaction proces= s, > >> >> right? If you have a different DB for each day, then you could > >> >> compact previous days without affecting writing to the current day. > >> >> Also, once you've completely deleted all the records from a previou= s > >> >> days set of logs, you could then proceed to just delete that day's > >> >> database instead of compacting it. > >> >> > >> >> > >> >> On Thu, Jun 14, 2012 at 9:30 AM, Nicolas Peeters < > nicolists@gmail.com> > >> wrote: > >> >>> A few more hints, after investigation with the team. > >> >>> 1. We can't really have rotating DBs as sometimes we want to keep > older > >> >>> transaction records in the DB for a longer time. > >> >>> 2. We never replicate nor update the statements (so the _rev_limit > >> won't > >> >>> really change much (or will it for the compaction??)) > >> >>> > >> >>> On Thu, Jun 14, 2012 at 3:14 PM, Nicolas Peeters < > nicolists@gmail.com > >> >wrote: > >> >>> > >> >>>> Actually we never modify those records. Just query them up in > certain > >> >>>> cases. > >> >>>> > >> >>>> Regarding Robert's suggestion, I was indeed confused because he w= as > >> >>>> suggesting to delete them one by one. > >> >>>> > >> >>>> I need to read about the "lower_revs_limit". We never replicate > this > >> data. > >> >>>> > >> >>>> > >> >>>> On Thu, Jun 14, 2012 at 3:08 PM, Tim Tisdall > >> wrote: > >> >>>> > >> >>>>> I think he's suggesting avoiding compaction completely. Just > delete > >> >>>>> the old DB when you've finished deleting all the records. > >> >>>>> > >> >>>>> On Thu, Jun 14, 2012 at 9:05 AM, Nicolas Peeters < > >> nicolists@gmail.com> > >> >>>>> wrote: > >> >>>>> > Interesting suggestion. However, this would perhaps have the > same > >> effect > >> >>>>> > (deleting/compacting the old DB is what makes the system > >> slower)...? > >> >>>>> > > >> >>>>> > On Thu, Jun 14, 2012 at 2:54 PM, Robert Newson < > rnewson@apache.org > >> > > >> >>>>> wrote: > >> >>>>> > > >> >>>>> >> Do you eventually delete every document you add? > >> >>>>> >> > >> >>>>> >> If so, consider using a rolling database scheme instead. At > some > >> >>>>> >> point, perhaps daily, start a new database and write new > >> transaction > >> >>>>> >> logs there. Continue deleting old logs from the previous > >> database(s) > >> >>>>> >> until they're empty (doc_count:0) and then delete the databas= e. > >> >>>>> >> > >> >>>>> >> B. > >> >>>>> >> > >> >>>>> >> On 14 June 2012 13:44, Nicolas Peeters > >> wrote: > >> >>>>> >> > I'd like some advice from the community regarding compactio= n. > >> >>>>> >> > > >> >>>>> >> > *Scenario:* > >> >>>>> >> > > >> >>>>> >> > We have a large-ish CouchDB database that is being used for > >> >>>>> transactional > >> >>>>> >> > logs (very write heavy). Once in a while, we delete some of > the > >> >>>>> records > >> >>>>> >> in > >> >>>>> >> > large batches and we have scheduled compaction (not automat= ic > >> (yet)) > >> >>>>> >> every > >> >>>>> >> > 12hours. > >> >>>>> >> > > >> >>>>> >> > From what I can see, the DB is being hammered significantly > >> every 12 > >> >>>>> >> hours > >> >>>>> >> > and the compaction is taking 4 hours (with a size of 50-100= GB > >> of log > >> >>>>> >> data). > >> >>>>> >> > > >> >>>>> >> > *The problem:* > >> >>>>> >> > > >> >>>>> >> > The problem is that compaction takes a very long time and > >> reduces the > >> >>>>> >> > performance of the stack. It seems that it's hard for the > >> compaction > >> >>>>> >> > process to "keep up" with the insertions, hence why it take= s > so > >> long. > >> >>>>> >> Also, > >> >>>>> >> > what I'm not sure is how "incremental" the compaction is... > >> >>>>> >> > > >> >>>>> >> > 1. In this case, would it make sense to run the compactio= n > >> more > >> >>>>> often > >> >>>>> >> > (every 10 minutes); since we're write-heavy. > >> >>>>> >> > 1. Should we just run more often? (so hopefully it > doesn't > >> do > >> >>>>> >> > unnecessary work too often). Actually, in our case, we > >> should > >> >>>>> >> probably > >> >>>>> >> > never have automatic compaction if there has been no > >> >>>>> "termination". > >> >>>>> >> > 2. Or actually only once in a while? (bigger batch, bu= t > >> less > >> >>>>> >> > "useless" overhead) > >> >>>>> >> > 3. Or should we just wait that a given size (which is > the > >> >>>>> problem > >> >>>>> >> > really) is hit and use the auto compaction (in CouchDB > >> 1.2.0) > >> >>>>> for > >> >>>>> >> this? > >> >>>>> >> > 2. In CouchDB 1.2.0 there's a new feature: auto > >> >>>>> >> > compaction< > >> >>>>> >> http://wiki.apache.org/couchdb/Compaction#Automatic_Compactio= n > > > >> >>>>> >> > which > >> >>>>> >> > may be useful for us. There's the "strict_window" feature > to > >> give > >> >>>>> a max > >> >>>>> >> > amount of time to compact and cancel the compaction after > >> that (in > >> >>>>> >> order > >> >>>>> >> > not to have it running for 4h+=85). I'm wondering what th= e > >> impact of > >> >>>>> >> that is > >> >>>>> >> > on the long run. What if the compaction cannot be complet= ed > >> in that > >> >>>>> >> window? > >> >>>>> >> > > >> >>>>> >> > Thanks a lot! > >> >>>>> >> > > >> >>>>> >> > Nicolas > >> >>>>> >> > >> >>>>> > >> >>>> > >> >>>> > >> > --e89a8f23579d2c12bd04c2705e02--