incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <adam.kocolo...@gmail.com>
Subject Re: Multiple search criteria with ranges
Date Mon, 15 Dec 2008 15:24:04 GMT
Hi Dan, it's not a general-purpose solution, but in the specific  
example you gave where there's only one continuous variable you might  
be able to do something like the following.  Use a map function that  
emits your filterable quantities as a list:

emit([doc.beds, doc.baths, doc.price], null)

and then query that view multiple times with:

startkey=[4,2,350000]&endkey=[4,2,400000]
startkey=[4,3,350000]&endkey=[4,3,400000]
startkey=[5,2,350000]&endkey=[5,2,400000]
startkey=[5,2,350000]&endkey=[4,3,400000]

The advantage is that you get only the data you need; the disadvantage  
is that the number of queries scales non-linearly with the number of  
fields used in the filter, and there's no easy way to skip fields  
(you'd need to query with every possible value of that field, or else  
write an additional view that doesn't emit it).

A possible middle ground would be to query this same view with a  
single request:

startkey=[4,2,350000]&endkey=[5,3,400000]

You'll still need to filter the results on the client side, since e.g.  
a 4 bed, 2 bath, $600k listing would get included, but at least the  
data volume would be smaller than doing the whole intersection  
yourself.  If you go this route, the discrete vs. continuous variable  
thing doesn't really matter; just arrange the keys so that the one  
with the greatest discriminating power comes first.

Best, Adam


On Dec 14, 2008, at 12:06 PM, Dan Woolley wrote:

> I'm researching Couchdb for a project dealing with real estate  
> listing data.  I'm very interested in Couchdb because the schema  
> less nature, RESTful interface, and potential off-line usage with  
> syncing fit my problem very well.  I've been able to do some  
> prototyping and search on ranges for a single field very  
> successfully.  I'm having trouble wrapping my mind around views for  
> a popular use case in real estate, which is a query like:
>
> Price = 350000-400000
> Beds = 4-5
> Baths = 2-3
>
> Any single range above is trivial, but what is the best model for  
> handling this AND scenario with views?  The only thing I've been  
> able to come up with is three views returning doc id's - which  
> should be very fast - with an array intersection calculation on the  
> client side.  Although I haven't tried it yet, that client side  
> calculation worries me with a potential document with 1M records -  
> the client would potentially be dealing with calculating the  
> intersection of multiple 100K element arrays.  Is that a realistic  
> calculation?
>
> Please tell me there is a better model for dealing with this type of  
> scenario - or that this use case is not well suited for Couchdb at  
> this time and I should move along.
>
>
> Dan Woolley
> profile:  http://www.linkedin.com/in/danwoolley
> company:  http://woolleyrobertson.com
> product:  http://dwellicious.com
> blog:  http://tzetzefly.com
>
>
>


Mime
View raw message