incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dean Landolt" <d...@deanlandolt.com>
Subject Re: Multiple search criteria with ranges
Date Mon, 15 Dec 2008 17:46:54 GMT
On Mon, Dec 15, 2008 at 12:08 PM, ara.t.howard <ara.t.howard@gmail.com>wrote:

>
> On Dec 14, 2008, at 10:06 AM, Dan Woolley wrote:
>
>  I'm researching Couchdb for a project dealing with real estate listing
>> data.  I'm very interested in Couchdb because the schema less nature,
>> RESTful interface, and potential off-line usage with syncing fit my problem
>> very well.  I've been able to do some prototyping and search on ranges for a
>> single field very successfully.  I'm having trouble wrapping my mind around
>> views for a popular use case in real estate, which is a query like:
>>
>> Price = 350000-400000
>> Beds = 4-5
>> Baths = 2-3
>>
>> Any single range above is trivial, but what is the best model for handling
>> this AND scenario with views?  The only thing I've been able to come up with
>> is three views returning doc id's - which should be very fast - with an
>> array intersection calculation on the client side.  Although I haven't tried
>> it yet, that client side calculation worries me with a potential document
>> with 1M records - the client would potentially be dealing with calculating
>> the intersection of multiple 100K element arrays.  Is that a realistic
>> calculation?
>>
>> Please tell me there is a better model for dealing with this type of
>> scenario - or that this use case is not well suited for Couchdb at this time
>> and I should move along.
>>
>
> using ruby or js i can compute the intersection of two 100k arrays in about
> 10/th a sec, for example with this code
>
>
>
> a=Array.new(100_000).map{ rand }
> b=Array.new(100_000).map{ rand }
>
> start_time=Time.now.to_f
>
> intersection = b | a
>
> end_time=Time.now.to_f
>
> puts(end_time - start_time)  #=> 0.197230815887451
>
>
> and that's on my laptop which isn't too quick using ruby which also isn't
> too quick.
>
>
> i guess to me it's seems like keeping an index of each attribute to search
> by and doing refinements is going to plenty fast, offload cpu cycles to the
> client, and keep the code orthogonal and easy to understand - you have one
> index per field, period.
>
> in addition it seems like are always going to have a natural first criteria
> and that you might be able to use startkey_docid/endkey_docid to limit the
> result set of the second and third queries to smaller and smaller ranges of
> ids (in the normal case).
>
> cheers.


I don't think it's the actual intersection that kills, but the downloading
and json parsing of what would be megabytes of data (just the docids alone
for 200k docs would be quite a few mb).

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message