couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garren Smith <gar...@apache.org>
Subject Re: _all_docs collation
Date Tue, 31 Mar 2020 10:13:26 GMT
Awesome. Thanks for explaining that. I imagined it had good historical
reasoning. I've changed _all_docs in fdb to follow the raw collation
https://github.com/apache/couchdb/commit/9b325b75814418b85ffb3642a5115635416f56a8

On Tue, Mar 31, 2020 at 11:07 AM Jan Lehnardt <jan@apache.org> wrote:

>
>
> > On 26. Mar 2020, at 11:18, Garren Smith <garren@apache.org> wrote:
> >
> > Oh interesting, reading the documentation more carefully I see we have
> raw
> > collation
> >
> https://docs.couchdb.org/en/stable/ddocs/views/collation.html#raw-collation
> > So _all_docs is using that and that explains why an object comes before a
> > string.
> > So do we want to keep raw collation for _all_docs?
>
>
> The reason for this is a simplified codepath and maybe even performance
> for regular database operations. _all_docs internally is the by-id index
> that performs any and all document reads and writes, so the original design
> tried make this as lean as possible generally. Since we do Unicode
> collation in a NIF, that’s an extra step we did not want to take at the
> time.
>
> I can’t judge the impact of this for FDB since we already have to do
> key-mangling, is another NIF call there that much of a problem? Has it ever
> been? NIFs have vastly improved since the original design, so I don’t
> really know. If it doesn’t make a performance difference, I would not
> object to changing the behaviour, if it would simplify our _all_docs code.
> That said, since we have the raw option and want to keep it, we’ll have two
> paths anyway and switching the default for one route doesn’t sound like a
> hard problem.
>
> That leaves compatibility. I’d wager that there are few cases which rely
> on raw collation in _all_docs, and for those, it’d be easy enough to adjust
> to the new world. That said, If there is no overwhelming reason to change
> the current behaviour, I’d say we keep things as-is.
>
> Best
> Jan
> —
>
>
> >
> > On Thu, Mar 26, 2020 at 11:45 AM Glynn Bird <glynn.bird@gmail.com>
> wrote:
> >
> >> It's not something I was aware of, but it's certainly a known "feature",
> >> documented here:
> >> https://docs.couchdb.org/en/stable/ddocs/views/collation.html#all-docs
> >>
> >> (probably because all keys are strings in all_docs, whereas they can be
> all
> >> sorts of mixed types with a view, and ascii collation would be faster
> with
> >> that assumption)
> >>
> >> On Thu, 26 Mar 2020 at 07:12, Garren Smith <garren@apache.org> wrote:
> >>
> >>> Hi Everyone,
> >>>
> >>> While working on the Mango implementation for FDB, I've noticed that
> >>> _all_docs has some weird
> >>> ordering collation. If you do something like GET
> >> /db/_all_docs?startkey={}
> >>> it will return all the documents even though in view collation an
> object
> >> is
> >>> ordered after strings. The reason I've noticed this is that in the
> >>> pouchdb-find tests we have a few tests that check that {selector: {_id:
> >>> {$gt: {}}} return all the docs in the database [0].
> >>>
> >>> This ordering feels wrong to me, but I'm guessing its been around for a
> >>> while. Currently for _all_docs on FDB, we have it that if you did the
> >> above
> >>> startkey query, it would not return any documents as we are following
> the
> >>> view collation ordering.
> >>>
> >>> I want to know whether we should keep the old _all_docs ordering or
> >> rather
> >>> standardize on view collation ordering everywhere?
> >>>
> >>> I would prefer we change it, but I'm not sure the implications of that
> >> for
> >>> client libraries and users.
> >>> Changing it would be a breaking change, but since 4.0 is going to be a
> >> lot
> >>> of breaking change I think this would be our best chance to do this.
> >>>
> >>> Cheers
> >>> Garren
> >>>
> >>>
> >>>
> >>> [0]
> >>>
> >>>
> >>
> https://github.com/nolanlawson/pouchdb-find/commit/e1ca2e2d18041f05a3d19bce4254f4d7b349ad20
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message