couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: _all_docs collation
Date Tue, 31 Mar 2020 09:07:51 GMT


> On 26. Mar 2020, at 11:18, Garren Smith <garren@apache.org> wrote:
> 
> Oh interesting, reading the documentation more carefully I see we have raw
> collation
> https://docs.couchdb.org/en/stable/ddocs/views/collation.html#raw-collation
> So _all_docs is using that and that explains why an object comes before a
> string.
> So do we want to keep raw collation for _all_docs?


The reason for this is a simplified codepath and maybe even performance for regular database
operations. _all_docs internally is the by-id index that performs any and all document reads
and writes, so the original design tried make this as lean as possible generally. Since we
do Unicode collation in a NIF, that’s an extra step we did not want to take at the time.

I can’t judge the impact of this for FDB since we already have to do key-mangling, is another
NIF call there that much of a problem? Has it ever been? NIFs have vastly improved since the
original design, so I don’t really know. If it doesn’t make a performance difference,
I would not object to changing the behaviour, if it would simplify our _all_docs code. That
said, since we have the raw option and want to keep it, we’ll have two paths anyway and
switching the default for one route doesn’t sound like a hard problem.

That leaves compatibility. I’d wager that there are few cases which rely on raw collation
in _all_docs, and for those, it’d be easy enough to adjust to the new world. That said,
If there is no overwhelming reason to change the current behaviour, I’d say we keep things
as-is.

Best
Jan
—


> 
> On Thu, Mar 26, 2020 at 11:45 AM Glynn Bird <glynn.bird@gmail.com> wrote:
> 
>> It's not something I was aware of, but it's certainly a known "feature",
>> documented here:
>> https://docs.couchdb.org/en/stable/ddocs/views/collation.html#all-docs
>> 
>> (probably because all keys are strings in all_docs, whereas they can be all
>> sorts of mixed types with a view, and ascii collation would be faster with
>> that assumption)
>> 
>> On Thu, 26 Mar 2020 at 07:12, Garren Smith <garren@apache.org> wrote:
>> 
>>> Hi Everyone,
>>> 
>>> While working on the Mango implementation for FDB, I've noticed that
>>> _all_docs has some weird
>>> ordering collation. If you do something like GET
>> /db/_all_docs?startkey={}
>>> it will return all the documents even though in view collation an object
>> is
>>> ordered after strings. The reason I've noticed this is that in the
>>> pouchdb-find tests we have a few tests that check that {selector: {_id:
>>> {$gt: {}}} return all the docs in the database [0].
>>> 
>>> This ordering feels wrong to me, but I'm guessing its been around for a
>>> while. Currently for _all_docs on FDB, we have it that if you did the
>> above
>>> startkey query, it would not return any documents as we are following the
>>> view collation ordering.
>>> 
>>> I want to know whether we should keep the old _all_docs ordering or
>> rather
>>> standardize on view collation ordering everywhere?
>>> 
>>> I would prefer we change it, but I'm not sure the implications of that
>> for
>>> client libraries and users.
>>> Changing it would be a breaking change, but since 4.0 is going to be a
>> lot
>>> of breaking change I think this would be our best chance to do this.
>>> 
>>> Cheers
>>> Garren
>>> 
>>> 
>>> 
>>> [0]
>>> 
>>> 
>> https://github.com/nolanlawson/pouchdb-find/commit/e1ca2e2d18041f05a3d19bce4254f4d7b349ad20
>>> 
>> 


Mime
View raw message