incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Candler <B.Cand...@pobox.com>
Subject The need for a key prefix view parameter
Date Thu, 30 Apr 2009 13:23:17 GMT
Unless I am missing something obvious here, I think CouchDB doesn't provide
me with any way to query for an exact key prefix.

(1) Goal: to show all keys in a view which start with exactly "Abc"

(2) The problem: the Unicode Collation Algorithm doesn't let me do this
using a range query.

For example, consider the query startkey="Abc"&endkey="AbcZZZZZZZZ"

This will also match "ABC" and "abc1" (but not "abc"). This is correct UCA
behaviour, but not useful for finding this key prefix. See script 1 below.

Right now the only workaround I can see is gross: emiting hex pairs in the
view, e.g. "416263" instead of "Abc".

Aside: this also demonstrates that the only sensible way to use
startkey/endkey is to lowercase the startkey and to uppercase the end key.
For example:

    startkey="abc"&endkey="ABCZZZZZZZZ"

will give me all keys which start with [aA][bB][cC] in any case combination.
This is of course useful in many applications.

However if you use a startkey which is not all lowercase, then you will get
some keys which match 'abc' in some combinations of case, but not all of
them, which isn't very helpful.

(3) An alternative is to lowercase startkey and endkey and use
inclusive-end, e.g.

    startkey="abc"&endkey="abd"&inclusive_end=false

but this is not simple as you have to know the 'next' character in the UCA
sequence; for example "9" is followed by "a". Furthermore, you still need a
character after "Z", and we're still only getting case-insensitive matching.

(4) I think a prefix option could also help with array key ranges. For
example, startkey=["foo"]&endkey=["foo",{}] is a common idiom which covers
most keys which have "foo" in the first slot, but it does't match
["foo",{"some":"object"}] - see script 2 below. Having prefix=["foo"] could
match everything.

(5) Strangely, doc id keys in _all_docs appear to behave differently;
perhaps they are ASCII-compared rather than UCA compared. See script 3
below.

Regards,

Brian.


----------------- 1. example using map keys -----------------
DB="http://127.0.0.1:5984/test"
curl -X DELETE "$DB"
curl -X PUT "$DB"
curl -T - -X POST "$DB/_bulk_docs" <<JSON
{"docs":[
{"var":"abc"},
{"var":"abc1"},
{"var":"aBC"},
{"var":"Abc"},
{"var":"Abc2"},
{"var":"ABC"},
{"var":"abd"}
]}
JSON

curl -T - -X PUT "$DB/_design/myview" <<JSON
{
  "views":{
    "v":{
      "map":"
      function(doc) {
        if (doc.var) { emit(doc.var,null); }
      }"
    }
  }
}
JSON

curl "$DB/_design/myview/_view/v?startkey=%22Abc%22&endkey=%22AbcZZZZZZZZ%22"
# This returns 4 rows, but I only wanted Abc and Abc2

-------------- 2. example with array key --------------
DB="http://127.0.0.1:5984/test"
curl -X DELETE "$DB"
curl -X PUT "$DB"
curl -T - -X POST "$DB/_bulk_docs" <<JSON
{"docs":[
{"var":["abc"]},
{"var":["abc",123]},
{"var":["abc",{"foo":"bar"}]},
]}
JSON

curl -T - -X PUT "$DB/_design/myview" <<JSON
{
  "views":{
    "v":{
      "map":"
      function(doc) {
        if (doc.var) { emit(doc.var,null); }
      }"
    }
  }
}
JSON

curl "$DB/_design/myview/_view/v?startkey=%5B%22abc%22%5D&endkey=%5B%22abc%22,%7B%7D%5D"

-------------- 3. example using doc ids in _all_docs --------------
DB="http://127.0.0.1:5984/test"
curl -X DELETE "$DB"
curl -X PUT "$DB"
curl -T - -X POST "$DB/_bulk_docs" <<JSON
{"docs":[
{"_id":"abc"},
{"_id":"abc1"},
{"_id":"aBC"},
{"_id":"Abc"},
{"_id":"Abc2"},
{"_id":"ABC"},
{"_id":"abd"}
]}
JSON

curl "$DB/_all_docs?startkey=%22Abc%22&endkey=%22AbcZZZZZZZZ%22"
# This returns the 2 rows I wanted


Mime
View raw message