Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 15510 invoked from network); 6 Feb 2009 18:30:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Feb 2009 18:30:30 -0000 Received: (qmail 19627 invoked by uid 500); 6 Feb 2009 18:30:29 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 19583 invoked by uid 500); 6 Feb 2009 18:30:29 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 19571 invoked by uid 99); 6 Feb 2009 18:30:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Feb 2009 10:30:29 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Feb 2009 18:30:20 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 89EBC234C4A9 for ; Fri, 6 Feb 2009 10:29:59 -0800 (PST) Message-ID: <841067607.1233944999550.JavaMail.jira@brutus> Date: Fri, 6 Feb 2009 10:29:59 -0800 (PST) From: "Chris Anderson (JIRA)" To: dev@couchdb.apache.org Subject: [jira] Commented: (COUCHDB-194) [startkey, endkey[: provide a right-open range selection method MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/COUCHDB-194?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D126= 71231#action_12671231 ]=20 Chris Anderson commented on COUCHDB-194: ---------------------------------------- I just want to express my support for the change. Let's not forget this one= , it will be a fair amount of work I think. > [startkey, endkey[: provide a right-open range selection method > --------------------------------------------------------------- > > Key: COUCHDB-194 > URL: https://issues.apache.org/jira/browse/COUCHDB-194 > Project: CouchDB > Issue Type: Improvement > Components: HTTP Interface > Affects Versions: 0.9 > Reporter: Maximillian Dornseif > Priority: Blocker > Fix For: 1.0 > > > While writing something about using CouchDB I came across the issue of "s= lice indexes" (called startkey and endkey in CouchDB lingo). > I found no exact definition of startkey and endkey anywhere in the docume= ntation. Testing reveals that access on _all_docs and on views documents ar= e retuned in the interval > [startkey, endkey] =3D (startkey <=3D k <=3D endkey). > I don't know if this was a conscious design decision. But I like to promo= te a slightly different interpretation (and thus API change): > [startkey, endkey[ =3D (startkey <=3D k < endkey). > Both approaches are valid and used in the real world. Ruby uses the inclu= sive ("right-closed" in math speak) first approach: > >> l =3D [1,2,3,4] > >> l.slice(1,2) > =3D> [2, 3] > Python uses the exclusive ("right-open" in math speak) second approach: > >>> l =3D [1,2,3,4] > >>> l[1,2] > [2] > For array indices both work fine and which one to prefer is mostly an iss= ue of habit. In spoken language both approaches are used: "Have the Softwar= e done until saturday" probably means right-open to the client and right-cl= osed to the coder. > But if you are working with keys that are more than array indexes, then r= ight-open is much easier to handle. That is because you have to *guess* the= biggest value you want to get. The Wiki at http://wiki.apache.org/couchdb/= View_collation contains an example of that problem: > It is suggested that you use > startkey=3D"_design/"&endkey=3D"_design/ZZZZZZZZZ" > or > startkey=3D"_design/"&endkey=3D"_design/\u9999=E2=80=B3 > to get a list of all design documents - also the replication system in th= e db core uses the same hack. > This breaks if a design document is named "ZZZZZZZZZTop" or "\9999I=C3=B1= t=C3=ABrn=C3=A2ti=C3=B4n=C3=A0liz=C3=A6ti=C3=B8n". Such names might be unli= kely but we are computer scientists; "unlikely" is a bad approach to softwa= re engineering. > The think what we really want to ask CouchDB is to "get all documents wit= h keys starting with '_design/'". > This is basically impossible to do with right-closed intervals. We could = use startkey=3D"_design/"&endkey=3D"_design0=E2=80=B3 ('0=E2=80=B2 is the A= SCII character after '/') and this will work fine ... until there is actual= ly a document with the key "_design0=E2=80=B3 in the system. Unlikely, but = ... > To make selection by intervals reliable currently clients have to guess t= he last key (the ZZZZ approach) or use the fist key not to include (the _de= sign0 approach) and then post process the result to remove the last element= returned if it exactly matches the given endkey value. > If couchdb would change to a right-open interval approach post processing= would go away in most cases. See http://blogs.23.nu/c0re/2008/12/building-= a-track-and-trace-application-with-couchdb/ for two real world examples. > At least for string keys and float keys changing the meaning to [startkey= , endkey[ would allow selections like > * "all strings starting with 'abc'" > * all numbers between 10.5 and 11 > It also would hopefully break not to much existing code. Since the notion= of endkey seems to be already considered "fishy" (see the ZZZZZ approach) = most code seems to try to avoid that issue. For example 'startkey=3D"_desig= n/"&endkey=3D"_design/ZZZZZZZZZ"' still would work unless you have a design= document being named exactly "ZZZZZZZZZ". --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.