couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: [jira] Commented: (COUCHDB-194) [startkey, endkey[: provide a right-open range selection method
Date Thu, 05 Feb 2009 22:06:37 GMT
I've been pondering this issue of the weird _design/ doc hack. I'd
either agree with Zach on having separately named keys for open or
right on *both* ends, or specific to the string and array types, a
startswith parameter. I don't much like the startswith idea though as
it's not generally applicable.

Also, did I miss what you'd pass in the _design doc scenario as end
key assuming right open semantics?

On Thu, Feb 5, 2009 at 4:57 PM, Zachary Zolton <zachary.zolton@gmail.com> wrote:
> Maximillian,
>
> I'd think both _could_ be useful.
>
> I mean in Ruby we have both for the right-hand boundary of ranges:
>  irb(main):005:0> (1..5).max
>  => 5
>  irb(main):006:0> (1...5).max
>  => 4
>
> IMHO, it would be better to use a different pair of parameter names,
> such that we could easily distinguish between open and closed bounds.
>
>
> Cheers,
>
> Zach
>
>
> PS. Is it "Maximillian" or "Max"?  :^D
>
> On Thu, Feb 5, 2009 at 3:32 PM, Maximillian Dornseif (JIRA)
> <jira@apache.org> wrote:
>>
>>    [ https://issues.apache.org/jira/browse/COUCHDB-194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670911#action_12670911
]
>>
>> Maximillian Dornseif commented on COUCHDB-194:
>> ----------------------------------------------
>>
>> So far nobody seems against it.
>>
>> The downside is that it MIGHT break some existing code.
>>
>>> [startkey, endkey[: provide a right-open range selection method
>>> ---------------------------------------------------------------
>>>
>>>                 Key: COUCHDB-194
>>>                 URL: https://issues.apache.org/jira/browse/COUCHDB-194
>>>             Project: CouchDB
>>>          Issue Type: Improvement
>>>          Components: HTTP Interface
>>>    Affects Versions: 0.9
>>>            Reporter: Maximillian Dornseif
>>>            Priority: Blocker
>>>             Fix For: 1.0
>>>
>>>
>>> While writing something about using CouchDB I came across the issue of "slice
indexes" (called startkey and endkey in CouchDB lingo).
>>> I found no exact definition of startkey and endkey anywhere in the documentation.
Testing reveals that access on _all_docs and on views documents are retuned in the interval
>>> [startkey, endkey] = (startkey <= k <= endkey).
>>> I don't know if this was a conscious design decision. But I like to promote a
slightly different interpretation (and thus API change):
>>> [startkey, endkey[ = (startkey <= k < endkey).
>>> Both approaches are valid and used in the real world. Ruby uses the inclusive
("right-closed" in math speak) first approach:
>>> >> l = [1,2,3,4]
>>> >> l.slice(1,2)
>>> => [2, 3]
>>> Python uses the exclusive ("right-open" in math speak) second approach:
>>> >>> l = [1,2,3,4]
>>> >>> l[1,2]
>>> [2]
>>> For array indices both work fine and which one to prefer is mostly an issue of
habit. In spoken language both approaches are used: "Have the Software done until saturday"
probably means right-open to the client and right-closed to the coder.
>>> But if you are working with keys that are more than array indexes, then right-open
is much easier to handle. That is because you have to *guess* the biggest value you want to
get. The Wiki at http://wiki.apache.org/couchdb/View_collation contains an example of that
problem:
>>> It is suggested that you use
>>> startkey="_design/"&endkey="_design/ZZZZZZZZZ"
>>> or
>>> startkey="_design/"&endkey="_design/\u9999″
>>> to get a list of all design documents - also the replication system in the db
core uses the same hack.
>>> This breaks if a design document is named "ZZZZZZZZZTop" or "\9999Iñtërnâtiônàlizætiøn".
Such names might be unlikely but we are computer scientists; "unlikely" is a bad approach
to software engineering.
>>> The think what we really want to ask CouchDB is to "get all documents with keys
starting with '_design/'".
>>> This is basically impossible to do with right-closed intervals. We could use
startkey="_design/"&endkey="_design0″ ('0′ is the ASCII character after '/') and this
will work fine ... until there is actually a document with the key "_design0″ in the system.
Unlikely, but ...
>>> To make selection by intervals reliable currently clients have to guess the last
key (the ZZZZ approach) or use the fist key not to include (the _design0 approach) and then
post process the result to remove the last element returned if it exactly matches the given
endkey value.
>>> If couchdb would change to a right-open interval approach post processing would
go away in most cases. See http://blogs.23.nu/c0re/2008/12/building-a-track-and-trace-application-with-couchdb/
for two real world examples.
>>> At least for string keys and float keys changing the meaning to [startkey, endkey[
would allow selections like
>>> * "all strings starting with 'abc'"
>>> * all numbers between 10.5 and 11
>>> It also would hopefully break not to much existing code. Since the notion of
endkey seems to be already considered "fishy" (see the ZZZZZ approach) most code seems to
try to avoid that issue. For example 'startkey="_design/"&endkey="_design/ZZZZZZZZZ"'
still would work unless you have a design document being named exactly "ZZZZZZZZZ".
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>

Mime
View raw message