incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Candler <B.Cand...@pobox.com>
Subject Re: Ordering of keys into reduce function
Date Wed, 13 May 2009 16:58:27 GMT
On Wed, May 13, 2009 at 12:50:20PM +0100, Brian Candler wrote:
> I want to write a reduce function which, when reducing over a range of keys,
> gives the minimum and maximum *key* found in that range. (*)
> 
> This could be done very easily and efficiently if I could rely on the
> following two properties:
> 
> (1) keys/values passed into a first reduce are in increasing order of key
> 
> (2) reduced values passed into a re-reduce are for increasing key ranges
> 
> The question is, can I rely on both of these properties?

To answer my own question, experimentation shows that clearly I can't.

Here's my test code, with the efficient reduce function I wanted to use:

---- 8< ----
require 'rubygems'
require 'restclient'
require 'json'

DB="http://127.0.0.1:5984/test"
RestClient.delete DB rescue nil
RestClient.put DB, {}.to_json

docs = []
(1..50).each do |i|
  docs << {"foo" => i*10}
  docs << {"foo" => i*10 + 1000}
  docs << {"foo" => i*10 + 2000}
end
RestClient.post "#{DB}/_bulk_docs", {'docs'=>docs}.to_json

RestClient.put "#{DB}/_design/test", {
  "views" => {
    "test" => {
      "map" => <<-MAP,
        function(doc) {
          if (doc.foo) { emit(doc.foo,null); }
        }
      MAP
      "reduce" => <<-REDUCE,
        function(ks, vs, co) {
          if (co) {
            var c = 0;
            for (var k in vs) { c += vs[k].count; }
            return {
              count: c,
              min:   vs[0].min,
              max:   vs[vs.length-1].max,
            }
          } else {
            return {
              count: ks.length,
              min:   ks[0][0],
              max:   ks[ks.length-1][0],
            }
          }
        }
      REDUCE
    }
  }
}.to_json

puts "\nreduce across all says:"
puts RestClient.get("#{DB}/_design/test/_view/test")

puts "\nreduce across 25..55 says:"
puts RestClient.get("#{DB}/_design/test/_view/test?startkey=25&endkey=55")

puts "\nreduce across 2385..2405 says:"
puts RestClient.get("#{DB}/_design/test/_view/test?startkey=2385&endkey=2405")

puts "\nreduce across 15..2405 says:"
puts RestClient.get("#{DB}/_design/test/_view/test?startkey=15&endkey=2405")
---- 8< ----

The output I get is:

---- 8< ----
reduce across all says:
{"rows":[
{"key":null,"value":{"count":150,"min":240,"max":10}}
]}

reduce across 25..55 says:
{"rows":[
{"key":null,"value":{"count":3,"min":50,"max":30}}
]}

reduce across 2385..2405 says:
{"rows":[
{"key":null,"value":{"count":2,"min":2400,"max":2390}}
]}

reduce across 15..2405 says:
{"rows":[
{"key":null,"value":{"count":139,"min":240,"max":20}}
---- 8< ----

So for small key ranges, i.e. just reduce (not re-reduce), it appears that
the keys are passed in *reverse* order into the function. Swapping min/max
fixes that. But then the reduce function doesn't work for large ranges (in
particular the last, 15..2405), so it seems I can't rely on a particular
ordering of reduce blobs into the re-reduce function.

Never mind. Back to a traditional comparison max/min function.

Regards,

Brian.

Mime
View raw message