Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B39F496CC for ; Fri, 23 Dec 2011 01:11:02 +0000 (UTC) Received: (qmail 74214 invoked by uid 500); 23 Dec 2011 01:11:02 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 74165 invoked by uid 500); 23 Dec 2011 01:11:02 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 74157 invoked by uid 99); 23 Dec 2011 01:11:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Dec 2011 01:11:02 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of randall.leeds@gmail.com designates 209.85.210.180 as permitted sender) Received: from [209.85.210.180] (HELO mail-iy0-f180.google.com) (209.85.210.180) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Dec 2011 01:10:55 +0000 Received: by iazz13 with SMTP id z13so15721167iaz.11 for ; Thu, 22 Dec 2011 17:10:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=SbAVdgkO3PjtbxZlHhWhkLvBD0dRI3Cm43Xxj8U0ybg=; b=Q3eQYetDwLUJXrIbjwGUVgBpX5umQbHTM+pWFtddplMoV4MjpQERetvZobRq9fX1BU EAsGI8Z1Te7eDQ2ur1dEgID9W0nVrHOi+JiB7MB0mI8MyYczz/hV853Zkm6cizyxtp1I H0NdfRc0XN8JFjc0RKEgXCQB3WT9FFRhGu0rM= MIME-Version: 1.0 Received: by 10.42.145.135 with SMTP id f7mr10732491icv.3.1324602634813; Thu, 22 Dec 2011 17:10:34 -0800 (PST) Received: by 10.43.131.3 with HTTP; Thu, 22 Dec 2011 17:10:34 -0800 (PST) In-Reply-To: References: <4BB9B050-6977-446E-8844-C6FE83DBDDA4@dionne-associates.com> <73943F19-3FBE-43D7-A97B-E012A391BFED@apache.org> Date: Thu, 22 Dec 2011 17:10:34 -0800 Message-ID: Subject: Re: Understanding the CouchDB file format From: Randall Leeds To: dev@couchdb.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org On Thu, Dec 22, 2011 at 12:11, Riyad Kalla wrote: > On Thu, Dec 22, 2011 at 12:38 PM, Robert Newson wrote: > >> There are a >> few parts of the article that are inaccurate (like the assertion we >> have good locality for the id and seq trees. If this were true we >> wouldn't have seen such a huge improvement in compaction by >> temporarily separating them). > > > I'd look forward to more detail on this... it was my understanding the > updates were appended out in the [doc rev][_id idx diff][_seq idx diff] > format at the end of the data file. Sounds like I may have misunderstood > that? > Riyad, as you pointed out earlier, all the inner nodes are rewritten up to the root. The two btrees are not written in parallel, though, which means that for deep trees all the updated nodes are written before the other tree's nodes are written. Also remember that the trees themselves end up pretty fragmented since older nodes that haven't changed are back toward the beginning of the file. In general, I'm not sure there's much that's useful to mention about locality in the trees. Also, updating these trees requires reading the old values, so there is still seeking that occurs (if the pages aren't cached by the OS). > >> The 'COMPETE recreation' paragraph also >> strikes me as factually awry. >> > > I'd appreciate a detailed correction on this if it is wrong; all the > digging I've done (in this thread and other partial resources) suggests > that the path from the changed doc ref back to the root (including a copy > of all internal nodes and all of their child references) is written so as > being able to read-back into the index from the tail of the data file > quickly... specifically slides 17, 18 and 19 from this slidedeck ( > http://www.slideshare.net/jchrisa/btree-nosql-oak?from=embed) -- note that > the interim nodes [A..M] and [A..Z] are rewritten (including any and all > child pointers they contained). > > This is what I was referring to; I should either clean up my wording > (confusing) or I got it wrong in which case I'd appreciate any and all > corrections. Right. It mostly seems a bit confusing to me. "it DOES NOT just rewrite the nodes pathing from the leaf to the node and ONLY the connections for that single document" That doesn't sound quite right, but I can tell what you're trying to say is accurate. If I'm right, you mean that every changed inner node is rewritten in its entirety rather than having a single pointer to the new child updated in place. Cheers. Thanks for taking the time to write this up. -Randall