Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
Message-ID: <64a10fff0812150946r7ca6ced2pc1776468bc5533a2@mail.gmail.com>
Date: Mon, 15 Dec 2008 12:46:54 -0500
From: "Dean Landolt" <dean@deanlandolt.com>
To: user@couchdb.apache.org
Subject: Re: Multiple search criteria with ranges
In-Reply-To: <50512BF7-84AA-4870-89E0-35D22B87D2F0@gmail.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_140247_13841005.1229363214725"
References: <12C3EB1F-8F69-4998-BB83-F45564CC9F64@gmail.com>
	 <50512BF7-84AA-4870-89E0-35D22B87D2F0@gmail.com>

------=_Part_140247_13841005.1229363214725
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On Mon, Dec 15, 2008 at 12:08 PM, ara.t.howard <ara.t.howard@gmail.com>wrote:

>
> On Dec 14, 2008, at 10:06 AM, Dan Woolley wrote:
>
>  I'm researching Couchdb for a project dealing with real estate listing
>> data.  I'm very interested in Couchdb because the schema less nature,
>> RESTful interface, and potential off-line usage with syncing fit my problem
>> very well.  I've been able to do some prototyping and search on ranges for a
>> single field very successfully.  I'm having trouble wrapping my mind around
>> views for a popular use case in real estate, which is a query like:
>>
>> Price = 350000-400000
>> Beds = 4-5
>> Baths = 2-3
>>
>> Any single range above is trivial, but what is the best model for handling
>> this AND scenario with views?  The only thing I've been able to come up with
>> is three views returning doc id's - which should be very fast - with an
>> array intersection calculation on the client side.  Although I haven't tried
>> it yet, that client side calculation worries me with a potential document
>> with 1M records - the client would potentially be dealing with calculating
>> the intersection of multiple 100K element arrays.  Is that a realistic
>> calculation?
>>
>> Please tell me there is a better model for dealing with this type of
>> scenario - or that this use case is not well suited for Couchdb at this time
>> and I should move along.
>>
>
> using ruby or js i can compute the intersection of two 100k arrays in about
> 10/th a sec, for example with this code
>
>
>
> a=Array.new(100_000).map{ rand }
> b=Array.new(100_000).map{ rand }
>
> start_time=Time.now.to_f
>
> intersection = b | a
>
> end_time=Time.now.to_f
>
> puts(end_time - start_time)  #=> 0.197230815887451
>
>
> and that's on my laptop which isn't too quick using ruby which also isn't
> too quick.
>
>
> i guess to me it's seems like keeping an index of each attribute to search
> by and doing refinements is going to plenty fast, offload cpu cycles to the
> client, and keep the code orthogonal and easy to understand - you have one
> index per field, period.
>
> in addition it seems like are always going to have a natural first criteria
> and that you might be able to use startkey_docid/endkey_docid to limit the
> result set of the second and third queries to smaller and smaller ranges of
> ids (in the normal case).
>
> cheers.


I don't think it's the actual intersection that kills, but the downloading
and json parsing of what would be megabytes of data (just the docids alone
for 200k docs would be quite a few mb).

------=_Part_140247_13841005.1229363214725--