couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Candler (JIRA)" <j...@apache.org>
Subject [jira] Created: (COUCHDB-403) User-defined GroupRowsFun
Date Tue, 07 Jul 2009 08:14:16 GMT
User-defined GroupRowsFun
-------------------------

                 Key: COUCHDB-403
                 URL: https://issues.apache.org/jira/browse/COUCHDB-403
             Project: CouchDB
          Issue Type: Wish
          Components: Database Core, HTTP Interface
            Reporter: Brian Candler
            Priority: Minor


CouchDB has hard-coded functionality for grouping. From the user's point of view: group_level=N
will truncate Array keys to the first N elements, and that's it. (*)

It would be wonderful if application-specific grouping functions could be added. Useful examples
include:

* for string keys, truncate to the first N characters (e.g. group by first 3 letters of surname)
* for numeric keys, trunc(k/N) (e.g. divide by 100 would give you buckets of 0..99, 100..199,
200..299 etc)
* combine with group_level: e.g. truncate array to first two elements plus the third element
divided by 100

    ["string1","string2",Number,"rest"] => ["string1","string2",trunc(Number/100)]

* for numeric keys: use trunc(log(V) * N) for exponential buckets
* for hexadecimal-string keys: right-shift N places
* ...etc

In each case N would be a parameter chosen at query time, like group_level is now.

It would be sufficient just to have a hook to statically link Erlang functions to do this.
There would then need to be two new HTTP parameters: one to choose the grouping function and
one for any arguments it needs.

Theoretically this function could also be handed off to the external view server so the logic
could be written in Javascript or whatever, but I think it would be too slow in practice.

Note: group truncation functions would have need to meet certain constraints to work with
grouping logic. Something like:
   K1 <= K2 implies grouptrunc(K1) <= grouptrunc(K2)

(*) It's not implemented exactly like that. As far as I can see, there's one function to compare
keys for equality by looking at the first N elements (GroupRowsFun), and another function
truncates them when emitting them (RespFun). For adding bolt-on functions it would be more
convenient just to define a single group key truncation function.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message