couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Stockton <chrisstockto...@gmail.com>
Subject Views - Finding the column count for a given "table"
Date Tue, 13 Apr 2010 17:21:59 GMT
Hello,

We have created a data-store for users, one big difference with this
table is that a specific document may not have the column defined in
its "data" object. This means as users add new columns we do not need
to back fill all the rows of the "table" with empty data. Couch is
working very well for us for this reason but I have become stumbled
with getting the counts of rows which contain columns for a given
"table".

Given a set of documents that look like:
{"table":"table1","data":{"a":1,"b":1,"c":1}} -> Table row, with a
column A, B and C. This is the tables defined full schema.
{"table":"table1","data":{"a":1,"c":1}} -> Here the user never put data into B.
{"table":"table1","data":{"a":1}}
{"table":"table1","data":{"c":1}}  -> Table row, with only one column,
C, the other 2 columns are NULL.
{"table":"table2","data":{"i":1}}
{"table":"table2","data":{"i":1,"o":1,"p":1}}
{"table":"table2","data":{"i":1,"o":1,"p":1}}

I need to turn them into a view which gives counts of occurrence
grouped by table and each data key, So I may look them up giving them
the table & column. Here is what I have currently:
{"rows":[
{"key":["table1","a"],"value":3},
{"key":["table1","b"],"value":1},
{"key":["table1","c"],"value":3},
{"key":["table2","i"],"value":3},
{"key":["table2","o"],"value":2},
{"key":["table2","p"],"value":2}
]}

The following MapReduce works fine on the above data set:
function(doc) {
  for(key in doc.data) {
    emit([doc.table, key], null);
  }
}

function(keys, values, rereduce) {
  if(rereduce) {
    return sum(values);
  } else {
    return values.length;
  }
}

However, when I put this view into tables with hundreds of rows I
start getting unexpected (well maybe expected, but not understood)
results. Mostly I just end up with many columns (I.E. a, b, c) being
the MAX count of all rows in the table. Any help getting my head
around thinking about this in the correct way would be much
appreciated.

-Chris

Mime
View raw message