Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 38493 invoked from network); 28 Mar 2010 03:10:19 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Mar 2010 03:10:19 -0000 Received: (qmail 95697 invoked by uid 500); 28 Mar 2010 03:10:18 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 95657 invoked by uid 500); 28 Mar 2010 03:10:18 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 95649 invoked by uid 99); 28 Mar 2010 03:10:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Mar 2010 03:10:17 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of itaborai83@gmail.com designates 209.85.219.224 as permitted sender) Received: from [209.85.219.224] (HELO mail-ew0-f224.google.com) (209.85.219.224) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Mar 2010 03:10:11 +0000 Received: by ewy24 with SMTP id 24so425555ewy.13 for ; Sat, 27 Mar 2010 20:09:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:received:message-id :subject:from:to:content-type; bh=p5UaoFffdyX3XRZDrXrTlxOcqcgQODnqs5d/1aW5L+Y=; b=AAzr2cYmooSF5rYUEhDcsAhQF3P51o57aLNsvY9JcZX1+Dfz9aLhnjg3aYR36JLxHg /PYIfwAb34Q056CAmvfJJXG/rX8lbBRWvBq8pfFZ0t/Xt8tt0t0UdPhaRuDfZ8DkVMDF L5GwLxi/3fOWYM/mWjXhMGLW6fpH+/KPpgtzA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=odKbb2u2WfXQdNrb7Gx0bWZpi/R1BfMNbaIyU2P5f/s7kwEFQsCh2tRMDiG7lMJ22u /xxNaKjsPVLkJBJMB/uM2P+xCDd2xn3KpXu1AfZssirmU+RnQfH5ClpILRXO+0nFJHwi gba1FK/Kz0q372AazYnTEuyOJOivjV/JS8u8g= MIME-Version: 1.0 Received: by 10.213.102.19 with HTTP; Sat, 27 Mar 2010 20:09:49 -0700 (PDT) Date: Sun, 28 Mar 2010 00:09:49 -0300 Received: by 10.213.56.75 with SMTP id x11mr1514803ebg.74.1269745789796; Sat, 27 Mar 2010 20:09:49 -0700 (PDT) Message-ID: Subject: Lame ad-hoc querying proposal: Map/Filter/Reduce From: =?ISO-8859-1?Q?Daniel_Itabora=ED?= To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=00c09ffb5625c1833c0482d3ba33 --00c09ffb5625c1833c0482d3ba33 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable First of all, excuse my newbyness. I've been curious about CouchDB for a while now and I just recently started to tinker with it. One thing that caught my attention was the querying part. I really like the whole map/reduce idea for the construction of the views, but after reading = a little a bit about Raindrop's megaview, I was wondering if there isn't an easier way to do ad-hoc queries (other than temporary views). If we look at couch's views more or less like indexes, we should be able to pass a user defined filter function whenever querying a view. That function would be executed after the map and before the reduce phase for each document. This function would receive as an argument the key and value emmited by the map function, as well as the document when include_docs=3Dtr= ue. This function would return a boolean value indicating whether it would filter out the map output or not. If the user specify this filter function, then the intermediated reductions stored in the B-Tree would have to be ignored while processing the user request(I don't know how the performance hit would be when ignoring these reductions, but it seems to me that they would be linear to the ammount of documents retrieved, which can be ok in a lot of situations). I think this is far from perfect. There are a lot of flaws such as these "filter" function taking too long to run(or potentially entering a infinite loop), their compilation or interpretation time taking too long, as well as tying the client to a specific query server language(maybe the client shoul= d also specify the query server language as a mandatory argument). Above all else, it's just plain ugly to construct this dynamic functions and pass the= m around. Despite all that, I think there should be a way to do ad-hoc querying other than retrieving a whole bunch of documents and discarding them on the client-side. At the heart of this issue I think it's the fact that whenever there's a time x space trade-off, CouchDB tends to sacrifice disk space. I believe that given Couch's design goals, that's definitely the right choice, but some form of ad-hoc querying must be supported, even though it's kinda kludgy. I really appreciate the work that you guys been doing and I firmly believe that Couch will be an even huger success than it is today. I think the work done on MVCC and replication really laid an awesome foundation on which man= y great things can and definitely will be built. regards, Daniel Itabora=ED --00c09ffb5625c1833c0482d3ba33--