couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Wolff <awo...@gmail.com>
Subject Re: Querying by foreign keys
Date Sun, 26 Apr 2009 04:12:38 GMT
No, I think Brian is right that the growth is exponential, rather than
factorial. In his example, I think for instance
>> 10 x [a b]       (a <= b)
means the pairs [1 2] [1 3] [1 4] [1 5] [2 3] [2 4] [2 5] [3 4] [3 5] [4 5]
A

On Sat, Apr 25, 2009 at 12:48 PM, Paul Davis
<paul.joseph.davis@gmail.com> wrote:
> On Sat, Apr 25, 2009 at 3:40 PM, Brian Candler <B.Candler@pobox.com> wrote:
>> On Sat, Apr 25, 2009 at 10:27:33AM -0700, Adam Wolff wrote:
>>> Hi list,
>>> I have a little query problem that would be easy to solve in SQL, but
>>> seems hard in CouchDB. I have data that looks like this:
>>> doc1: {
>>>    keys : [1,2,3],
>>>    data : ....
>>> }
>>> doc2: {
>>>    keys : [1,3,5],
>>>    data : ....
>>> }
>>> ...
>>>
>>> I'd like a view of my documents that lets me filter on multiple keys,
>>> so a query of keys=[1] yields doc1,doc2, as does a query of
>>> keys=[1,3].
>>
>> What do you mean by keys=[1,3]? Does it mean all documents which contains
>> either key 1 or 3, or all documents which contains both keys 1 and 3?
>>
>> In the first case, you can POST to the view with {"keys":[1,3]}
>>
>> In the latter case, the simplest answer is to fetch the views for key=1 and
>> key=3, and intersect them on the client. Actually you can be smarter than
>> that: fetch whichever result set is smaller (keys=1 or key=3), and then
>> filter the result set on the client to discard those entries which don't
>> have the other key.
>>
>> (This is exactly what a SQL database would do, except it hides it from you
>> by doing it all server-side)
>>
>>> In couch, the only way I can think of to truly index all the
>>> combinations of the keys is to emit the n! combinations of the keys,
>>> which seems like a bad idea. (Maybe it's not though?)
>>
>> I think it would be (2^n)-1 rather than n!, since you can sort them first.
>>
>> e.g. for 5 keys there would be 31 combinations:
>> 1 x []           -- useless, ignore
>> 5 x [a]
>> 10 x [a b]       (a <= b)
>> 10 x [a b c]     (a <= b <= c)
>> 5 x [a b c d]
>> 1 x [a b c d e]
>>
>> Actually you can halve it again, by realising that if you have emitted
>> [1,2,3,4,5] then you don't need to emit any prefixes of this, e.g. [1,2,3,4]
>> or [1,2,3] or [1,2] or [1], because a startkey-endkey query can be used to
>> find them.
>>
>
> I've contemplated something like this, but I'm pretty sure it breaks
> when you start contemplating that a key may end up being [a, e] in
> which case your querying needs to account for all possible values
> between a and e.
>
> In other news, if someone wants to write an inverted index
> implementation that could solve this class of queries I'm sure there'd
> probably be a couple beers in it.
>
> HTH,
> Paul Davis
>
>> Mind you, exponential growth still ain't good :-) It might be acceptable if
>> you have a limited number of tags, and you really want an instantaneous
>> answer to your multi-tag query across millions of documents.
>>
>> Other tradeoffs are possible. For example, it might make sense just to emit
>> all single tags plus all combinations of two tags; that would be (n^2+n)/2
>> combinations. Then your query can instantly find all docs matching two of
>> the tags requested, and if the user has asked for three or more, filter
>> those at the client. But you still have to choose which 2 tags to query on,
>> and it may well not be worth it in practice.
>>
>> Regards,
>>
>> Brian.
>>
>

Mime
View raw message