couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Davis" <paul.joseph.da...@gmail.com>
Subject Re: API suggestions
Date Sun, 28 Dec 2008 21:03:23 GMT
This question keeps coming up and it occurred to me that no one ever
mentions startkey_docid and endkey_docid when talking about this
issue. Theoretically the docid would be a prime candidate for
programatically selecting an open or closed interval on either end of
the key range. Since DocID is guranteed to be a string, we can use nil
and {} to select before key and after key respectively. (For
reference, startkey_docid actually defaults to nil and endkey_docid to
{} in the implementation)

So anyway, it turns out that someone thought ahead and realized that
those keys would only be strings and used list_to_binary instead of
?JSON_DECODE so this idea doesn't work automagically. But here's a
patch that shows the idea. (Only the idea though, to make this a real
patch we'd need to later a bit of logic for when descending=true)

Index: src/couchdb/couch_httpd_view.erl
===================================================================
--- src/couchdb/couch_httpd_view.erl    (revision 729771)
+++ src/couchdb/couch_httpd_view.erl    (working copy)
@@ -222,9 +222,23 @@
                 throw({query_parse_error, Msg})
             end;
         {"startkey_docid", DocId} ->
-            Args#view_query_args{start_docid=list_to_binary(DocId)};
+            case DocId of
+            "true" ->
+                Args#view_query_args{start_docid=nil};
+            "false" ->
+                Args#view_query_args{start_docid={}};
+            _ ->
+                Args#view_query_args{start_docid=list_to_binary(DocId)}
+            end;
         {"endkey_docid", DocId} ->
-            Args#view_query_args{end_docid=list_to_binary(DocId)};
+            case DocId of
+            "true" ->
+                Args#view_query_args{end_docid=nil};
+            "false" ->
+                Args#view_query_args{end_docid={}};
+            _ ->
+                Args#view_query_args{end_docid=list_to_binary(DocId)}
+            end;
         {"startkey", Value} ->
             case Keys of
             nil ->

Anyone got arguments against?

Paul

On Sun, Dec 28, 2008 at 9:12 AM,  <md@hudora.de> wrote:
> While writing something about using CouchDB I came across the issue of "slice indexes"
(called startkey and endkey in CouchDB lingo).
>
> I found no exact definition of startkey and endkey anywhere in the documentation. Testing
reveals that access on _all_docs and on views documents are retuned in the interval
>
> [startkey, endkey] = (startkey <= k <= endkey).
>
> I don't know if this was a conscious design decision. But I like to promote a slightly
different interpretation (and thus API change):
>
> [startkey, endkey[ = (startkey <= k < endkey).
>
>
> Both approaches are valid and used in the real world. Ruby uses the inclusive ("right-closed"
in math speak) first approach:
>
>>> l = [1,2,3,4]
>>> l.slice(1,2)
> => [2, 3]
>
>
> Python uses the exclusive ("right-open" in math speak) second approach:
>
>>>> l = [1,2,3,4]
>>>> l[1,2]
> [2]
>
>
> For array indices both work fine and which one to prefer is mostly an issue of habit.
In spoken language both approaches are used: "Have the Software done until saturday" probably
means right-open to the client and right-closed to the coder.
>
> But if you are working with keys that are more than array indexes, then right-open is
much easier to handle. That is because you have to *guess* the biggest value you want to get.
The Wiki at http://wiki.apache.org/couchdb/View_collation contains an example of that problem:
>
> It is suggested that you use
> startkey="_design/"&endkey="_design/ZZZZZZZZZ"
> or
> startkey="_design/"&endkey="_design/\u9999"
> to get a list of all design documents
>
> This breaks if a design document is named "ZZZZZZZZZTop" or "\9999Iñtërnâtiônàlizætiøn".
Such names might be unlikely but we are computer scientists; "unlikely" is a bad approach
to software engineering.
>
> The think what we really want to ask CouchDB is to "get all documents with keys starting
with '_design/'".
>
> This is basically impossible to do with right-closed intervals. We could use startkey="_design/"&endkey="_design0"
('0' is the ASCII character after '/') and this will work fine ... until there is actually
a document with the key "_design0" in the system. Unlikely, but ...
>
> To make selection by intervals reliable currently clients have to guess the last key
(the ZZZZ approach) or use the fist key not to include (the _design0 approach) and then post
process the result to remove the last element returned if it exactly matches the given endkey
value.
>
>
> If couchdb would change to a right-open interval approach post processing would go away
in most cases. See http://blogs.23.nu/c0re/2008/12/building-a-track-and-trace-application-with-couchdb/
for two real world examples.
>
> At least for string keys and float keys changing the meaning to [startkey, endkey[ would
allow selections like
>
> * "all strings starting with 'abc'"
> * all numbers between 10.5 and 11
>
> It also would hopefully break not to much existing code. Since the notion of endkey seems
to be already considered "fishy" (see the ZZZZZ approach) most code seems to try to avoid
that issue. For example 'startkey="_design/"&endkey="_design/ZZZZZZZZZ"' still would work
unless you have a design document being named exactly "ZZZZZZZZZ".
>
> Regards
>
> Maximillian Dornseif
>
>

Mime
View raw message