Return-Path: Delivered-To: apmail-incubator-couchdb-user-archive@locus.apache.org Received: (qmail 59116 invoked from network); 19 Aug 2008 08:17:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Aug 2008 08:17:24 -0000 Received: (qmail 3543 invoked by uid 500); 19 Aug 2008 08:17:22 -0000 Delivered-To: apmail-incubator-couchdb-user-archive@incubator.apache.org Received: (qmail 3521 invoked by uid 500); 19 Aug 2008 08:17:22 -0000 Mailing-List: contact couchdb-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-user@incubator.apache.org Delivered-To: mailing list couchdb-user@incubator.apache.org Received: (qmail 3510 invoked by uid 99); 19 Aug 2008 08:17:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Aug 2008 01:17:22 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ralf.nieuwenhuijsen@gmail.com designates 209.85.198.245 as permitted sender) Received: from [209.85.198.245] (HELO rv-out-0708.google.com) (209.85.198.245) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Aug 2008 08:16:26 +0000 Received: by rv-out-0708.google.com with SMTP id k29so2281153rvb.0 for ; Tue, 19 Aug 2008 01:16:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=aeC/UhXroTfY8X+WDRsCnYZIftBz82MtvMharNCdr1g=; b=qM+c8o+DzbjO54bWdEm1mReF0gSUCvW5I3zg5qSL1KMsbO6dHZvep+uEhfB+4shmCK nFbt9mhOOnxMqpConYGrf+/Y0fW+c2dsNLUNcELNCWzZ3NNmZ1yShNIFV49max1R89Hn FGKBBafCGsdQzKpU9BLEUP6fvMJv1l0c7KlSo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=Yz9ea0K39iW4wcWz/R+Ocd5gKgq5wSkXntJp0efyOWFnS+wAIT9AW7csa3m7hf+Bsn ZpNLj44UNoBkc7aY2bqRQEFnLxQwmzXjBcWU0h3neDziN4vttKjDXxIYvFlh+y3nUKjW MTxMe6YVjYWA5QewDG3zG2+YyhNz6QMRnhoZM= Received: by 10.114.197.1 with SMTP id u1mr6287789waf.75.1219133815373; Tue, 19 Aug 2008 01:16:55 -0700 (PDT) Received: by 10.114.58.9 with HTTP; Tue, 19 Aug 2008 01:16:55 -0700 (PDT) Message-ID: <41fe564f0808190116i235cb618sabe19a2059e6289f@mail.gmail.com> Date: Tue, 19 Aug 2008 10:16:55 +0200 From: "Ralf Nieuwenhuijsen" To: couchdb-user@incubator.apache.org Subject: Re: flexible filtering needed, with speed. In-Reply-To: <76E4AFEC-7FCA-4063-9819-34150CF19E68@sankatygroup.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <76E4AFEC-7FCA-4063-9819-34150CF19E68@sankatygroup.com> X-Virus-Checked: Checked by ClamAV on apache.org Don't take Futon as a speed measure; since it might also be slowing down in the rendering part if your documents are big. (there is a lot of stuff going on client=side as well). The truth is, all data that is being searched, people only care about 3-5 different types of search. You can offcourse, go nuts with the indexing and just generate all possible indexes you could possible need. Here is one of my favorites; this creates an index for every unique field. function(doc) { for(var k in doc){ emit([k,1,doc[k]], rdoc); } } You can query it like: use startkey=['someField',1,null] and endkey=['someField',2,null] To get the index for 'someField'. Offcourse, this baby is going to create a huge index if used with too many or too big documents, but I would at least try something like that. I use the above view function to make sure I can get the data sorted however I want. 2008/8/19 Brad Anderson : > Howdy, > > I have 12K docs that look like this: > > { > "_id": "000111bf7a8515da822b05ebbb8cd257", > "_rev": "94750440", > "month": 17, > "store": { > "store_num": 123, > "city": "Atlanta", > "state": "GA", > "zip": "30301", > "exterior": true, > "interior": true, > "restroom": true, > "breakfast": true, > "sunday": true, > "adi_name": "Atlanta, GA", > "adi_num": 123, > "ownership": "Company", > "playground": "Indoor", > "seats": 123, > "parking_spaces": 123 > }, > "raw": { > "Other Hourly Pay": 0.28, > "Workers Comp - State Funds Exp": 401.65, > "Rent Expense - Company": -8, > "Archives Expense": 82.81, > "Revised Hours allowed per": 860.22, > "Merch Standard": 174.78, > "Total Property Tax": 1190.91 > > ... > > } > } > > I truncated 'raw' but it's usually much longer, and avg. doc size is 5K. > > I'm trying to see how I will query them with views. I want to be able to > filter down by various store sub fields, i.e all the Breakfast = true stores > in Georgia that are owned by Franchisees. However, this will differ for > just about every query. > > The 'reduce' function would then be averaging each line in the 'raw' field. > > I have played around with views that take the store filters, but just > returning the 'raw' field as the value from the map function is brutally > slow in Futon. This is because the view is accessed right away, so it > builds, takes about 3-4 mins (on a MBP with 4GB RAM, 2.2GHz dual core, > 7200RPM disk). I understand the next time this specific store group is > requested, it's fast... but they will all be so dynamic that this seems > prohibitively slow. > > So, I thought, should I be doing this in two steps? Set up the key to be > store and whatever else I might want to query on (Month or whatever > timeframe), and return the doc id's as the values on the original query? I > would then send in a complex key to do the filtering. This would require > waiting for the _bulk_get functionality, and I'd send that list of ID's into > a 2nd query to get the raw data to send it to 'map'. > > This is slow now on 12K docs... It needs to be stupid-fast at that low > number of docs, because the plan is for *way* more data. > > The filtering part is tailor-made for a RDBMS, but the doc handling (all the > 'raw' fields will be different store-by-store, industry by industry, change > over time, and in general be free-form) is perfect for CouchDB. Thoughts? > I want to use the right tool for the job, and that's looking like a RDBMS, > sadly. That is, unless I'm completely misusing Couch. In which case, swift > blows to the head are welcome. > > Cheers, > BA > > >