incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Smith <...@iriscouch.com>
Subject Re: Building views to locate documents WITHOUT a certain set of tags
Date Thu, 01 Dec 2011 01:01:23 GMT
On Thu, Dec 1, 2011 at 3:49 AM, Rob Crowell <robccrowell@gmail.com> wrote:
> I suppose it would be possible to make multiple queries, using
> startkey and endkey to pull out the ranges.
>
> 1. Sort the "bad" tags: (BROKEN_IMAGE, OFFENSIVE_IMAGE)
> 2. For each bad tag, request documents:
>    i. Query 1:
>        startkey = []
>        endkey = ["BROKEN_IMAGE"]
>
>    ii. Query 2:
>        startkey = ["BROKEN_IMAGE", {}]
>        endkey = ["OFFENSIVE_IMAGE"]
>
>    iii. Query 3:
>        startkey = ["OFFENSIVE_IMAGE", {}]
>        endkey = [{}]
>
> Requires making N+1 queries, which for a fairly small list wouldn't be too bad.

If you have a view of docs matching a condition, you can find docs
*not* matching that condition efficiently: make simultaneous queries
to _all_docs and your view. Both will be sorted by doc id. Iterate
through both at the same time (no need to storing them in memory),
spotting ids listed in _all_docs but not your view.

I wrote this up here: http://stackoverflow.com/a/6210422/2938

Notes:

* If rows have identical keys, CouchDB sorts them by doc id. You can
emit any value for the rows; what's important here is row.id
* You can generalize the technique to perform multiple "NOT" queries
simultaneously.
* This is a situation where concurrent or event-driven languages like
Javascript or Erlang shine
* I'm pretty sure that in practice, the "NOT" queries add zero cost to
the query. It always takes the same time to complete: the time to
fetch _all_docs.

I do not know if this technique has a name. If it doesn't, may I
propose: "The Thai Massage."

>
> On Wed, Nov 30, 2011 at 3:10 PM, Rob Crowell <robccrowell@gmail.com> wrote:
>> Hey everyone, view question here.
>>
>> I've got couch records that represent images.  They may have any
>> number of tags (from zero to hundreds).  However, while there are
>> thousands of tags in the dataset, there are only a couple that are
>> considered "bad" (BROKEN_IMAGE, BLANK_IMAGE, etc.)  Here's an example
>> document:
>>
>> {
>>    _id: ...,
>>    url: "http://example.org/whatever.png",
>>    tags: ["OUTDOORS", "BEACH", "RED_DRESS"]
>> }
>>
>> I wrote a view to emit documents that don't have these "bad" tags by
>> hard-coding the list of bad tags and checking every tag against this
>> list.  If none of the tags are bad, then emit the document.
>>
>> However, a user may also specify tags that he doesn't like
>> (OFFENSIVE_IMAGE, DENVER_BRONCOS, whatever).  Is there any good way to
>> build a view around this idea ("show me all documents that don't have
>> a set of tags") short of defining a custom view (with their own "bad"
>> tags list) for every user?
>>
>> I could do this filtering client-side of course, but if I wanted to
>> generate an exhaustive list of matching documents (for a report or
>> something similar) then it would be a lot of work.  I'm stumped at the
>> moment.  Thanks for any suggestions!
>>
>



-- 
Iris Couch

Mime
View raw message