incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Querying by foreign keys
Date Sat, 25 Apr 2009 19:48:02 GMT
On Sat, Apr 25, 2009 at 3:40 PM, Brian Candler <B.Candler@pobox.com> wrote:
> On Sat, Apr 25, 2009 at 10:27:33AM -0700, Adam Wolff wrote:
>> Hi list,
>> I have a little query problem that would be easy to solve in SQL, but
>> seems hard in CouchDB. I have data that looks like this:
>> doc1: {
>>    keys : [1,2,3],
>>    data : ....
>> }
>> doc2: {
>>    keys : [1,3,5],
>>    data : ....
>> }
>> ...
>>
>> I'd like a view of my documents that lets me filter on multiple keys,
>> so a query of keys=[1] yields doc1,doc2, as does a query of
>> keys=[1,3].
>
> What do you mean by keys=[1,3]? Does it mean all documents which contains
> either key 1 or 3, or all documents which contains both keys 1 and 3?
>
> In the first case, you can POST to the view with {"keys":[1,3]}
>
> In the latter case, the simplest answer is to fetch the views for key=1 and
> key=3, and intersect them on the client. Actually you can be smarter than
> that: fetch whichever result set is smaller (keys=1 or key=3), and then
> filter the result set on the client to discard those entries which don't
> have the other key.
>
> (This is exactly what a SQL database would do, except it hides it from you
> by doing it all server-side)
>
>> In couch, the only way I can think of to truly index all the
>> combinations of the keys is to emit the n! combinations of the keys,
>> which seems like a bad idea. (Maybe it's not though?)
>
> I think it would be (2^n)-1 rather than n!, since you can sort them first.
>
> e.g. for 5 keys there would be 31 combinations:
> 1 x []           -- useless, ignore
> 5 x [a]
> 10 x [a b]       (a <= b)
> 10 x [a b c]     (a <= b <= c)
> 5 x [a b c d]
> 1 x [a b c d e]
>
> Actually you can halve it again, by realising that if you have emitted
> [1,2,3,4,5] then you don't need to emit any prefixes of this, e.g. [1,2,3,4]
> or [1,2,3] or [1,2] or [1], because a startkey-endkey query can be used to
> find them.
>

I've contemplated something like this, but I'm pretty sure it breaks
when you start contemplating that a key may end up being [a, e] in
which case your querying needs to account for all possible values
between a and e.

In other news, if someone wants to write an inverted index
implementation that could solve this class of queries I'm sure there'd
probably be a couple beers in it.

HTH,
Paul Davis

> Mind you, exponential growth still ain't good :-) It might be acceptable if
> you have a limited number of tags, and you really want an instantaneous
> answer to your multi-tag query across millions of documents.
>
> Other tradeoffs are possible. For example, it might make sense just to emit
> all single tags plus all combinations of two tags; that would be (n^2+n)/2
> combinations. Then your query can instantly find all docs matching two of
> the tags requested, and if the user has asked for three or more, filter
> those at the client. But you still have to choose which 2 tags to query on,
> and it may well not be worth it in practice.
>
> Regards,
>
> Brian.
>

Mime
View raw message