Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 47124 invoked from network); 15 Dec 2008 17:47:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Dec 2008 17:47:21 -0000 Received: (qmail 65669 invoked by uid 500); 15 Dec 2008 17:47:32 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 65642 invoked by uid 500); 15 Dec 2008 17:47:32 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 65631 invoked by uid 99); 15 Dec 2008 17:47:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Dec 2008 09:47:32 -0800 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.146.176] (HELO wa-out-1112.google.com) (209.85.146.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Dec 2008 17:47:17 +0000 Received: by wa-out-1112.google.com with SMTP id m34so1342638wag.27 for ; Mon, 15 Dec 2008 09:46:54 -0800 (PST) Received: by 10.115.92.2 with SMTP id u2mr5049734wal.228.1229363214727; Mon, 15 Dec 2008 09:46:54 -0800 (PST) Received: by 10.115.61.1 with HTTP; Mon, 15 Dec 2008 09:46:54 -0800 (PST) Message-ID: <64a10fff0812150946r7ca6ced2pc1776468bc5533a2@mail.gmail.com> Date: Mon, 15 Dec 2008 12:46:54 -0500 From: "Dean Landolt" To: user@couchdb.apache.org Subject: Re: Multiple search criteria with ranges In-Reply-To: <50512BF7-84AA-4870-89E0-35D22B87D2F0@gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_140247_13841005.1229363214725" References: <12C3EB1F-8F69-4998-BB83-F45564CC9F64@gmail.com> <50512BF7-84AA-4870-89E0-35D22B87D2F0@gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_140247_13841005.1229363214725 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On Mon, Dec 15, 2008 at 12:08 PM, ara.t.howard wrote: > > On Dec 14, 2008, at 10:06 AM, Dan Woolley wrote: > > I'm researching Couchdb for a project dealing with real estate listing >> data. I'm very interested in Couchdb because the schema less nature, >> RESTful interface, and potential off-line usage with syncing fit my problem >> very well. I've been able to do some prototyping and search on ranges for a >> single field very successfully. I'm having trouble wrapping my mind around >> views for a popular use case in real estate, which is a query like: >> >> Price = 350000-400000 >> Beds = 4-5 >> Baths = 2-3 >> >> Any single range above is trivial, but what is the best model for handling >> this AND scenario with views? The only thing I've been able to come up with >> is three views returning doc id's - which should be very fast - with an >> array intersection calculation on the client side. Although I haven't tried >> it yet, that client side calculation worries me with a potential document >> with 1M records - the client would potentially be dealing with calculating >> the intersection of multiple 100K element arrays. Is that a realistic >> calculation? >> >> Please tell me there is a better model for dealing with this type of >> scenario - or that this use case is not well suited for Couchdb at this time >> and I should move along. >> > > using ruby or js i can compute the intersection of two 100k arrays in about > 10/th a sec, for example with this code > > > > a=Array.new(100_000).map{ rand } > b=Array.new(100_000).map{ rand } > > start_time=Time.now.to_f > > intersection = b | a > > end_time=Time.now.to_f > > puts(end_time - start_time) #=> 0.197230815887451 > > > and that's on my laptop which isn't too quick using ruby which also isn't > too quick. > > > i guess to me it's seems like keeping an index of each attribute to search > by and doing refinements is going to plenty fast, offload cpu cycles to the > client, and keep the code orthogonal and easy to understand - you have one > index per field, period. > > in addition it seems like are always going to have a natural first criteria > and that you might be able to use startkey_docid/endkey_docid to limit the > result set of the second and third queries to smaller and smaller ranges of > ids (in the normal case). > > cheers. I don't think it's the actual intersection that kills, but the downloading and json parsing of what would be megabytes of data (just the docids alone for 200k docs would be quite a few mb). ------=_Part_140247_13841005.1229363214725--