couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Joseph Davis (JIRA)" <>
Subject [jira] Commented: (COUCHDB-485) 'startkey_docid' should function like 'startkey'
Date Thu, 27 Aug 2009 18:31:59 GMT


Paul Joseph Davis commented on COUCHDB-485:

I'm highly doubt that I would be a fan of a patch for this. Unless I'm missing a way to do
this efficiently, I'm pretty sure that this would be the same logic that would be required
to use an array filter.

For instance, consider the worst case scenario. You have a view that emits the same key for
every document. Then, your query is a startkey that collates before the key, and startkey_docid
collates to the last document id. The query would then have to seek through the entire view
set which is unbounded. It could be argued that allowing for skip=N provides the same sinkhole
in terms of efficiency, but that seems a bit more of an obvious user choice to me.

startkey_docid can definitely be confusing until you learn that btree's are sorted by (Key,
DocId). And its the same confusion that pops up with sorting arrays. However, its not artificially
limiting, its just a limit of slicing a collated list.

Also, re-reading the ticket, I think there's some confusion. If you select an identical key
range, then startkey_docid will select a range of documents as necessary. For instance, in
the slug case, if you emit(doc.category, null) then ?startkey=category&startkey_docid=first_docid&endkey=category&endkey_docid=last_docid,
you will get back just the range of docids.

> 'startkey_docid' should function like 'startkey'
> ------------------------------------------------
>                 Key: COUCHDB-485
>                 URL:
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: HTTP Interface
>    Affects Versions: 0.10
>         Environment: N/A
>            Reporter: Christopher Groskopf
>            Priority: Minor
> The 'startkey_docid' and 'endkey_docid' parameters provide a way of sub-selecting rows
for pagination when a view emits many rows with identical key values.  However, it seems both
confusing and unintentionally limiting that 'startkey_docid' does not function the same as
'startkey' with regard to how included documents are identified.
> By this I mean, that if a a group of data is emitted with ISO 8601 timestamps as keys
(e.g. "2009-08-25T12:00:00Z") then its possible to specify 'startkey="2009-08"' and include
that example data, because it is collated after 'startkey'.  However, it those timestamps
were emitted as doc ids instead of keys, 'startkey_docid'  will only act to filter the data
if it _exactly_ matches a doc id.  Specifying 'startkey_docid="2009-08"' would not filter
the data at all, even if every selected row has the same key.
> The benefit of implementing this change is that views which emit many identical keys
could be sub-filtered based on document id.  In the case of my application, the first portion
of a document's id is a timestamp, so I would be able to select a chronological subset of
rows after they had been filtered by key.  Another possible use case is where doc ids are
slugs--this would make it possible to select an alphabetical range after specifying a category
as a key parameter.
> I haven't looked under the hood and I have never written Erlang, so I have no way of
accurately estimating how significant this change would be.  Unless I'm misunderstanding something,
this change should not break existing code.
> Looking forward to reading any feedback/comments/alternatives.
> Thanks,
> Chris

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message