Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 72346 invoked from network); 28 Dec 2008 20:26:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Dec 2008 20:26:23 -0000 Received: (qmail 36955 invoked by uid 500); 28 Dec 2008 20:26:23 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 36652 invoked by uid 500); 28 Dec 2008 20:26:22 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Delivered-To: moderator for dev@couchdb.apache.org Received: (qmail 99215 invoked by uid 500); 28 Dec 2008 14:12:41 -0000 Delivered-To: apmail-incubator-couchdb-dev@incubator.apache.org X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) X-Virus-Scanned: amavisd-new at X-Spam-Score: 0.49 X-Spam-Level: Date: Sun, 28 Dec 2008 15:12:08 +0100 (CET) From: md@hudora.de Sender: m.dornseif@hudora.de To: couchdb-dev@incubator.apache.org Message-ID: <28066847.2716631230473528233.JavaMail.root@mail.hudora.biz> In-Reply-To: <16996374.2716611230473232915.JavaMail.root@mail.hudora.biz> Subject: Re: API suggestions MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [80.130.136.85] X-Mailer: Zimbra 5.0.8_GA_2463.RHEL4 (ZimbraWebClient - SAF3 (Mac)/5.0.8_GA_2463.RHEL4) X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Flag: NO X-Old-Spam-Status: No, score=0.49 tagged_above=-10 required=6.6 tests=[AWL=-1.207, BAYES_40=-0.185, RCVD_IN_PBL=0.905, RCVD_IN_SORBS_DUL=0.877, RDNS_NONE=0.1] While writing something about using CouchDB I came across the issue of "sli= ce indexes" (called startkey and endkey in CouchDB lingo).=20 I found no exact definition of startkey and endkey anywhere in the document= ation. Testing reveals that access on _all_docs and on views documents are = retuned in the interval [startkey, endkey] =3D (startkey <=3D k <=3D endkey). I don't know if this was a conscious design decision. But I like to promote= a slightly different interpretation (and thus API change): [startkey, endkey[ =3D (startkey <=3D k < endkey). Both approaches are valid and used in the real world. Ruby uses the inclusi= ve ("right-closed" in math speak) first approach: >> l =3D [1,2,3,4] >> l.slice(1,2) =3D> [2, 3] Python uses the exclusive ("right-open" in math speak) second approach: >>> l =3D [1,2,3,4] >>> l[1,2] [2] For array indices both work fine and which one to prefer is mostly an issue= of habit. In spoken language both approaches are used: "Have the Software = done until saturday" probably means right-open to the client and right-clos= ed to the coder. But if you are working with keys that are more than array indexes, then rig= ht-open is much easier to handle. That is because you have to *guess* the b= iggest value you want to get. The Wiki at http://wiki.apache.org/couchdb/Vi= ew_collation contains an example of that problem: It is suggested that you use startkey=3D"_design/"&endkey=3D"_design/ZZZZZZZZZ"=20 or startkey=3D"_design/"&endkey=3D"_design/\u9999"=20 to get a list of all design documents This breaks if a design document is named "ZZZZZZZZZTop" or "\9999I=C3=B1t= =C3=ABrn=C3=A2ti=C3=B4n=C3=A0liz=C3=A6ti=C3=B8n". Such names might be unlik= ely but we are computer scientists; "unlikely" is a bad approach to softwar= e engineering. The think what we really want to ask CouchDB is to "get all documents with = keys starting with '_design/'". This is basically impossible to do with right-closed intervals. We could us= e startkey=3D"_design/"&endkey=3D"_design0" ('0' is the ASCII character aft= er '/') and this will work fine ... until there is actually a document with= the key "_design0" in the system. Unlikely, but ... To make selection by intervals reliable currently clients have to guess the= last key (the ZZZZ approach) or use the fist key not to include (the _desi= gn0 approach) and then post process the result to remove the last element r= eturned if it exactly matches the given endkey value. If couchdb would change to a right-open interval approach post processing w= ould go away in most cases. See http://blogs.23.nu/c0re/2008/12/building-a-= track-and-trace-application-with-couchdb/ for two real world examples. At least for string keys and float keys changing the meaning to [startkey, = endkey[ would allow selections like * "all strings starting with 'abc'" * all numbers between 10.5 and 11 It also would hopefully break not to much existing code. Since the notion o= f endkey seems to be already considered "fishy" (see the ZZZZZ approach) mo= st code seems to try to avoid that issue. For example 'startkey=3D"_design/= "&endkey=3D"_design/ZZZZZZZZZ"' still would work unless you have a design d= ocument being named exactly "ZZZZZZZZZ". Regards Maximillian Dornseif