From user-return-3452-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Tue Feb 10 19:34:35 2009 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 32047 invoked from network); 10 Feb 2009 19:34:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Feb 2009 19:34:35 -0000 Received: (qmail 47960 invoked by uid 500); 10 Feb 2009 19:34:29 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 47932 invoked by uid 500); 10 Feb 2009 19:34:29 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 47921 invoked by uid 99); 10 Feb 2009 19:34:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Feb 2009 11:34:29 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of barrywark@gmail.com designates 209.85.200.169 as permitted sender) Received: from [209.85.200.169] (HELO wf-out-1314.google.com) (209.85.200.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Feb 2009 19:34:21 +0000 Received: by wf-out-1314.google.com with SMTP id 28so31wff.29 for ; Tue, 10 Feb 2009 11:33:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type:content-transfer-encoding; bh=6QN50vQkEf6uy97oWINW9HdyQ1982liylmWRCmCz/wo=; b=KF6Pjz9sPYdiLSon8bqbga7rZXeM3+2NI5dDeKSozABW4RhmQjGIkuGRmFc0gdK9sD ZMhDpgxkmX+NAid9scHpv5zM+Zk8VXF6eWlip7byeGzFGlPuPBsLqZMJRGgoiuVzS7Zg Q1JAEB50PVjIrglxyYccgyeltekEkoYMnTCXc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type :content-transfer-encoding; b=ENQkB6Yo3WXf7xB8sj5OjyZHOs3oj+s+bV3uve8y4UI8b+yn8hDLmKJ6JuEhOsEkLw 38zjnQuUHWNpI2rN6c5Qu1HCMxAEE9vbNzQPMwER/H+bGxUwLU3GUcKgWi4ZfWg7KJIi 7cbqwD3q5dyiWI4Saj6WWjGY4P7hUIYCei+FM= MIME-Version: 1.0 Received: by 10.114.24.5 with SMTP id 5mr4987348wax.106.1234294439871; Tue, 10 Feb 2009 11:33:59 -0800 (PST) Date: Tue, 10 Feb 2009 11:33:59 -0800 Message-ID: Subject: A permanent view for user-entered query with complex boolean expressions? From: Barry Wark To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi all, I'm in the planning stage for a frontend to a large data set of physiology data. I'm new to CouchDB and would like to get some feedback on the feasibility of some ideas before I dig to far into implementation. The data: Conceptually, the important parts of the data set can be modeled as a set of trials. Each trial has one or more stimulus settings which are key-value pairs. Not all trials have the same set of settings and not all trials with the same setting have the same value for that setting. CouchDB documents appear well-suited for this form of data. In addition, each trial has one or more numeric datasets, each order 1MB, but up to 100MB. It seems that having CouchDB documents that contain a key-value pair like "parameters" : { "parameter1" : value1, "parameter2" : value 2, //etc. } and with attachments for the numeric data sets is the CouchDB way to go. Users will want to query this data set for all trials whose settings satisfy some boolean expression. So, for example "trials where (parameters['parameter1'] == 10 AND parameters['parameter2'] >= 42)" So, now a few questions: 1. Is there a way to create a permanent view that supports queries like that above? I got as far as a view like map: function map(doc) { for parameter in doc.parameters { emit([parameter, doc.parameters[parameter]], doc._id) } } reduce: function reduce(keys, values, rereduce) { if(rereduce) { return union(values) } return values } I believe this will give a view which, when queried with group=True will give a set of rows with keyed by [parameter, parameterValue] and with a list of trial document IDs that have that parameter:parameterValue. Is this correct? Given this, I could do a union of the values of rows with startkey=[parameter1, 10],count=1 and startkey=[parameter2, 42] to get the set of trial document ids that match the query. But is there a way to structure the view's map/reduce so that I don't have to do the union in my code (i.e. CouchDB does it as part of the map/reduce)? The approach outlined above leads to an HTTP GET for each term in the boolean expression, for example. 2. What is the (practical) limit on attachment size? Is it reasonable to store multi-MB attachments in the database? If not, I will go with an external file(s) for the numeric data and storing a reference in the trial document. Thanks for any insight, Barry