incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Barry Wark <barryw...@gmail.com>
Subject A permanent view for user-entered query with complex boolean expressions?
Date Tue, 10 Feb 2009 19:33:59 GMT
Hi all,

I'm in the planning stage for a frontend to a large  data set of
physiology data. I'm new to CouchDB and would like to get some
feedback on the feasibility of some ideas before I dig to far into
implementation.

The data:
Conceptually, the important parts of the data set can be modeled as a
set of trials. Each trial has one or more stimulus settings which are
key-value pairs. Not all trials have the same set of settings and not
all trials with the same setting have the same value for that setting.
CouchDB documents appear well-suited for this form of data. In
addition, each trial has one or more numeric datasets, each order 1MB,
but up to 100MB. It seems that having CouchDB documents that contain a
key-value pair like

"parameters" : {
    "parameter1" : value1,
    "parameter2" : value 2,
    //etc.
}

and with attachments for the numeric data sets is the CouchDB way to go.

Users will want to query this data set for all trials whose settings
satisfy some boolean expression. So, for example "trials where
(parameters['parameter1'] == 10 AND parameters['parameter2'] >= 42)"

So, now a few questions:

1. Is there a way to create a permanent view that supports queries
like that above? I got as far as a view like

map:
function map(doc) {
    for parameter in doc.parameters {
        emit([parameter, doc.parameters[parameter]], doc._id)
    }
}

reduce:
function reduce(keys, values, rereduce) {
    if(rereduce) {
        return union(values)
    }

    return values
}

I believe this will give a view which, when queried with group=True
will give a set of rows with keyed by [parameter, parameterValue] and
with a list of trial document IDs that have that
parameter:parameterValue. Is this correct?

Given this, I could do a union of the values of rows with
startkey=[parameter1, 10],count=1 and startkey=[parameter2, 42] to get
the set of trial document ids that match the query.

But is there a way to structure the view's map/reduce so that I don't
have to do the union in my code (i.e. CouchDB does it as part of the
map/reduce)? The approach outlined above leads to an HTTP GET for each
term in the boolean expression, for example.

2. What is the (practical) limit on attachment size? Is it reasonable
to store multi-MB attachments in the database? If not, I will go with
an external file(s) for the numeric data and storing a reference in
the trial document.

Thanks for any insight,

Barry

Mime
View raw message