From dev-return-49217-archive-asf-public=cust-asf.ponee.io@couchdb.apache.org Tue Mar 31 10:13:39 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 716B3180181 for ; Tue, 31 Mar 2020 12:13:39 +0200 (CEST) Received: (qmail 40010 invoked by uid 500); 31 Mar 2020 10:13:38 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 39998 invoked by uid 99); 31 Mar 2020 10:13:38 -0000 Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.159) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Mar 2020 10:13:38 +0000 Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com [209.85.208.181]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 2C6D2F8C for ; Tue, 31 Mar 2020 10:13:38 +0000 (UTC) Received: by mail-lj1-f181.google.com with SMTP id g27so21300469ljn.10 for ; Tue, 31 Mar 2020 03:13:38 -0700 (PDT) X-Gm-Message-State: AGi0PuaD/YyK3UC5vuJ41CmB4gL09jw0bNpS1iVjodEhy7X37qgMlU70 iywdaBFZAf9Nt3cG0kfLCwZFPWAZa7p+46Vu+02rGg== X-Google-Smtp-Source: APiQypJFzRopi5cP+bcEueEaETtP8+4ou/t2F2DTQjMqfdz6x0I3rZ5SfwDb4W6/gs0ML8LB3Au4YdY6MZ1RFrhxCZI= X-Received: by 2002:a2e:7205:: with SMTP id n5mr8114299ljc.192.1585649617146; Tue, 31 Mar 2020 03:13:37 -0700 (PDT) MIME-Version: 1.0 References: <5F7563A4-2FCD-4518-ADBC-F83A50061AC9@apache.org> In-Reply-To: <5F7563A4-2FCD-4518-ADBC-F83A50061AC9@apache.org> From: Garren Smith Date: Tue, 31 Mar 2020 12:13:26 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: _all_docs collation To: CouchDB Developers Content-Type: multipart/alternative; boundary="0000000000002b30b205a223d25c" --0000000000002b30b205a223d25c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Awesome. Thanks for explaining that. I imagined it had good historical reasoning. I've changed _all_docs in fdb to follow the raw collation https://github.com/apache/couchdb/commit/9b325b75814418b85ffb3642a511563541= 6f56a8 On Tue, Mar 31, 2020 at 11:07 AM Jan Lehnardt wrote: > > > > On 26. Mar 2020, at 11:18, Garren Smith wrote: > > > > Oh interesting, reading the documentation more carefully I see we have > raw > > collation > > > https://docs.couchdb.org/en/stable/ddocs/views/collation.html#raw-collati= on > > So _all_docs is using that and that explains why an object comes before= a > > string. > > So do we want to keep raw collation for _all_docs? > > > The reason for this is a simplified codepath and maybe even performance > for regular database operations. _all_docs internally is the by-id index > that performs any and all document reads and writes, so the original desi= gn > tried make this as lean as possible generally. Since we do Unicode > collation in a NIF, that=E2=80=99s an extra step we did not want to take = at the > time. > > I can=E2=80=99t judge the impact of this for FDB since we already have to= do > key-mangling, is another NIF call there that much of a problem? Has it ev= er > been? NIFs have vastly improved since the original design, so I don=E2=80= =99t > really know. If it doesn=E2=80=99t make a performance difference, I would= not > object to changing the behaviour, if it would simplify our _all_docs code= . > That said, since we have the raw option and want to keep it, we=E2=80=99l= l have two > paths anyway and switching the default for one route doesn=E2=80=99t soun= d like a > hard problem. > > That leaves compatibility. I=E2=80=99d wager that there are few cases whi= ch rely > on raw collation in _all_docs, and for those, it=E2=80=99d be easy enough= to adjust > to the new world. That said, If there is no overwhelming reason to change > the current behaviour, I=E2=80=99d say we keep things as-is. > > Best > Jan > =E2=80=94 > > > > > > On Thu, Mar 26, 2020 at 11:45 AM Glynn Bird > wrote: > > > >> It's not something I was aware of, but it's certainly a known "feature= ", > >> documented here: > >> https://docs.couchdb.org/en/stable/ddocs/views/collation.html#all-docs > >> > >> (probably because all keys are strings in all_docs, whereas they can b= e > all > >> sorts of mixed types with a view, and ascii collation would be faster > with > >> that assumption) > >> > >> On Thu, 26 Mar 2020 at 07:12, Garren Smith wrote: > >> > >>> Hi Everyone, > >>> > >>> While working on the Mango implementation for FDB, I've noticed that > >>> _all_docs has some weird > >>> ordering collation. If you do something like GET > >> /db/_all_docs?startkey=3D{} > >>> it will return all the documents even though in view collation an > object > >> is > >>> ordered after strings. The reason I've noticed this is that in the > >>> pouchdb-find tests we have a few tests that check that {selector: {_i= d: > >>> {$gt: {}}} return all the docs in the database [0]. > >>> > >>> This ordering feels wrong to me, but I'm guessing its been around for= a > >>> while. Currently for _all_docs on FDB, we have it that if you did the > >> above > >>> startkey query, it would not return any documents as we are following > the > >>> view collation ordering. > >>> > >>> I want to know whether we should keep the old _all_docs ordering or > >> rather > >>> standardize on view collation ordering everywhere? > >>> > >>> I would prefer we change it, but I'm not sure the implications of tha= t > >> for > >>> client libraries and users. > >>> Changing it would be a breaking change, but since 4.0 is going to be = a > >> lot > >>> of breaking change I think this would be our best chance to do this. > >>> > >>> Cheers > >>> Garren > >>> > >>> > >>> > >>> [0] > >>> > >>> > >> > https://github.com/nolanlawson/pouchdb-find/commit/e1ca2e2d18041f05a3d19b= ce4254f4d7b349ad20 > >>> > >> > > --0000000000002b30b205a223d25c--