Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 18602 invoked from network); 11 Jan 2010 09:39:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Jan 2010 09:39:53 -0000 Received: (qmail 93249 invoked by uid 500); 11 Jan 2010 09:39:52 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 93158 invoked by uid 500); 11 Jan 2010 09:39:52 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 93148 invoked by uid 99); 11 Jan 2010 09:39:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jan 2010 09:39:52 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of robert.newson@gmail.com designates 209.85.219.211 as permitted sender) Received: from [209.85.219.211] (HELO mail-ew0-f211.google.com) (209.85.219.211) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jan 2010 09:39:43 +0000 Received: by ewy3 with SMTP id 3so15527125ewy.33 for ; Mon, 11 Jan 2010 01:39:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=dFcuZaL/5gXgMuqCF4jgksdjxTDeM5xw2p/k4PWrhik=; b=VJFayAhUxeqnHY6km5UADQx6ZKnOGbG+SfYrJeR6ZSercgtxmTarrR6xPE4ed5SLwr MnNmfP1eJ73RmXd/U5WCs9n5wGukTE0VA0Zr4bjqkW13LrVQsrsOyV9bpg54YKWzabBu UnIW7T6okrS0OJtCXSOrfzqurAH2RHwkO9auo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=bb6Y+241EnC/1uueATslXCvBjWwjnHD9EQzKB85oNrtzoIyquJHOwQIxr6EI2S/Jto rTcI7+esLl38gcsbyzCwumMztRsrdl2KkyX/Q+Z15aTcysIzvjMnkgbBTS84ZHXS/HK7 RrTsJTc3QgMPNr8q/9vFfojxofr56QBXIuICI= MIME-Version: 1.0 Received: by 10.216.86.195 with SMTP id w45mr1377746wee.82.1263202760253; Mon, 11 Jan 2010 01:39:20 -0800 (PST) In-Reply-To: <46aeb24f1001110133m37e09845g32e4ff0880ff0fa3@mail.gmail.com> References: <20100109182316.B88D923889D5@eris.apache.org> <46aeb24f1001110133m37e09845g32e4ff0880ff0fa3@mail.gmail.com> Date: Mon, 11 Jan 2010 09:39:20 +0000 Message-ID: <46aeb24f1001110139r3d4e020eyffb16c40d1a3ca84@mail.gmail.com> Subject: Re: svn commit: r897509 - /couchdb/trunk/etc/couchdb/default.ini.tpl.in From: Robert Newson To: dev@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Roger, Unless I misread you, you are implying that _id is stored increasingly less inefficiently in the .couch file as its length increases? I don't think, unless you've really dug into the disk structure, that this assertions will hold. A better measure is the number of btree nodes on disk for the different kinds of _id. I would expect to see more nodes (and a higher rate of 'turnover') for large random ids than equally as large sequential ids. Large random ids versus short sequential ids should show a larger difference but for two unrelated reasons. B. On Mon, Jan 11, 2010 at 9:33 AM, Robert Newson wr= ote: > I should point out that the sequential algorithm in couchdb is > carefully constructed so that generated ids won't clash, even in a > distributed system. You might have assumed that the sequential ids > were 1, 2, 3, 4, ... and so on, but they are not. > > The sequential ids are the same length as the random ids (16 bytes). > The first 13 bytes stay the same for around 8000 generated ids and is > then rerandomized. The ids with the same prefix have suffixes in > strictly increasing numeric order. This characteristic (that a new id > is numerically close to the previous id) is what helps with insertion > speed and general b-tree performance. > > Before changing the default I think it would be worth getting numbers > from a suitably fair benchmark, I would still advocate random as the > default until that is done. > > B. > > On Mon, Jan 11, 2010 at 12:51 AM, Chris Anderson wrot= e: >> On Sun, Jan 10, 2010 at 4:24 PM, Roger Binns wro= te: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Chris Anderson wrote: >>>> I'm not feeling super-strong about this. However, making the default >>>> sequential seems like it will preempt a lot of the problems people >>>> tend to show up asking about. >>> >> >> If we think that speed and size are more important than randomness, we >> should continue to refine uuid generators. >> >> Roger, if you can make a short sequential that'd be neat. >> >>> There are several issues conflated together: >>> >>> - - When doing inserts, sorted ids are faster >>> >>> - - The resulting size of the db file is the size of the docs plus a mu= ltiple >>> of the _id size (and probably an exponential of the size) >>> >>> - - Sequential ids give small _id >>> >>> - - Random ids give large _id >>> >>> - - Sequentials will clash between different dbs (consider replication, >>> multiple instances etc). =A0They'll also lead people to rely on this >>> functionality as though it was like a SQL primary key >>> >>> - - Random ids won't clash and better illustrate how CouchDB really wor= ks >>> >>>> I think the info-leakage argument is overblown >>> >>> It does make URLs easy to guess like many ecommerce sites that didn't >>> validate when showing you an invoice - you added one to the numeric id = in >>> the URL and got to see someone elses. >>> >>> I would far prefer the size of the db file and the size of the _id link >>> being addressed. =A0Because the _id size can cause the db file to get s= o big, >>> I/O etc is a lot slower mainly because there is just so much more file = to >>> deal with! =A0(In my original posting I had a db go from 21GB to 4GB by >>> reducings ids from 16 bytes to 4 bytes.) >>> >>> Roger >>> >>> -----BEGIN PGP SIGNATURE----- >>> Version: GnuPG v1.4.9 (GNU/Linux) >>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >>> >>> iEYEARECAAYFAktKb8AACgkQmOOfHg372QRb0ACfRWu1TUOs3twwmOGgAUOwhLfx >>> FJkAoKgnkWnPayPtPqMfk3/AxOj2xaMx >>> =3DV7Zq >>> -----END PGP SIGNATURE----- >>> >>> >> >> >> >> -- >> Chris Anderson >> http://jchrisa.net >> http://couch.io >> >