couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ara.t.howard" <ara.t.how...@gmail.com>
Subject Re: Multiple search criteria with ranges
Date Mon, 15 Dec 2008 17:08:24 GMT

On Dec 14, 2008, at 10:06 AM, Dan Woolley wrote:

> I'm researching Couchdb for a project dealing with real estate  
> listing data.  I'm very interested in Couchdb because the schema  
> less nature, RESTful interface, and potential off-line usage with  
> syncing fit my problem very well.  I've been able to do some  
> prototyping and search on ranges for a single field very  
> successfully.  I'm having trouble wrapping my mind around views for  
> a popular use case in real estate, which is a query like:
>
> Price = 350000-400000
> Beds = 4-5
> Baths = 2-3
>
> Any single range above is trivial, but what is the best model for  
> handling this AND scenario with views?  The only thing I've been  
> able to come up with is three views returning doc id's - which  
> should be very fast - with an array intersection calculation on the  
> client side.  Although I haven't tried it yet, that client side  
> calculation worries me with a potential document with 1M records -  
> the client would potentially be dealing with calculating the  
> intersection of multiple 100K element arrays.  Is that a realistic  
> calculation?
>
> Please tell me there is a better model for dealing with this type of  
> scenario - or that this use case is not well suited for Couchdb at  
> this time and I should move along.

using ruby or js i can compute the intersection of two 100k arrays in  
about 10/th a sec, for example with this code



a=Array.new(100_000).map{ rand }
b=Array.new(100_000).map{ rand }

start_time=Time.now.to_f

intersection = b | a

end_time=Time.now.to_f

puts(end_time - start_time)  #=> 0.197230815887451


and that's on my laptop which isn't too quick using ruby which also  
isn't too quick.


i guess to me it's seems like keeping an index of each attribute to  
search by and doing refinements is going to plenty fast, offload cpu  
cycles to the client, and keep the code orthogonal and easy to  
understand - you have one index per field, period.

in addition it seems like are always going to have a natural first  
criteria and that you might be able to use startkey_docid/endkey_docid  
to limit the result set of the second and third queries to smaller and  
smaller ranges of ids (in the normal case).

cheers.


a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being  
better. simply reflect on that.
h.h. the 14th dalai lama




Mime
View raw message