Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
Message-ID: <779238461.1251397919256.JavaMail.jira@brutus>
Date: Thu, 27 Aug 2009 11:31:59 -0700 (PDT)
From: "Paul Joseph Davis (JIRA)" <jira@apache.org>
To: dev@couchdb.apache.org
Subject: [jira] Commented: (COUCHDB-485) 'startkey_docid' should function
 like 'startkey'
In-Reply-To: <724884641.1251224639418.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/COUCHDB-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748492#action_12748492 ] 

Paul Joseph Davis commented on COUCHDB-485:
-------------------------------------------

I'm highly doubt that I would be a fan of a patch for this. Unless I'm missing a way to do this efficiently, I'm pretty sure that this would be the same logic that would be required to use an array filter.

For instance, consider the worst case scenario. You have a view that emits the same key for every document. Then, your query is a startkey that collates before the key, and startkey_docid collates to the last document id. The query would then have to seek through the entire view set which is unbounded. It could be argued that allowing for skip=N provides the same sinkhole in terms of efficiency, but that seems a bit more of an obvious user choice to me.

startkey_docid can definitely be confusing until you learn that btree's are sorted by (Key, DocId). And its the same confusion that pops up with sorting arrays. However, its not artificially limiting, its just a limit of slicing a collated list.

Also, re-reading the ticket, I think there's some confusion. If you select an identical key range, then startkey_docid will select a range of documents as necessary. For instance, in the slug case, if you emit(doc.category, null) then ?startkey=category&startkey_docid=first_docid&endkey=category&endkey_docid=last_docid, you will get back just the range of docids.

> 'startkey_docid' should function like 'startkey'
> ------------------------------------------------
>
>                 Key: COUCHDB-485
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-485
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: HTTP Interface
>    Affects Versions: 0.10
>         Environment: N/A
>            Reporter: Christopher Groskopf
>            Priority: Minor
>
> The 'startkey_docid' and 'endkey_docid' parameters provide a way of sub-selecting rows for pagination when a view emits many rows with identical key values.  However, it seems both confusing and unintentionally limiting that 'startkey_docid' does not function the same as 'startkey' with regard to how included documents are identified.
> By this I mean, that if a a group of data is emitted with ISO 8601 timestamps as keys (e.g. "2009-08-25T12:00:00Z") then its possible to specify 'startkey="2009-08"' and include that example data, because it is collated after 'startkey'.  However, it those timestamps were emitted as doc ids instead of keys, 'startkey_docid'  will only act to filter the data if it _exactly_ matches a doc id.  Specifying 'startkey_docid="2009-08"' would not filter the data at all, even if every selected row has the same key.
> The benefit of implementing this change is that views which emit many identical keys could be sub-filtered based on document id.  In the case of my application, the first portion of a document's id is a timestamp, so I would be able to select a chronological subset of rows after they had been filtered by key.  Another possible use case is where doc ids are slugs--this would make it possible to select an alphabetical range after specifying a category as a key parameter.
> I haven't looked under the hood and I have never written Erlang, so I have no way of accurately estimating how significant this change would be.  Unless I'm misunderstanding something, this change should not break existing code.
> Looking forward to reading any feedback/comments/alternatives.
> Thanks,
> Chris

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.