On Thu, Dec 1, 2011 at 3:49 AM, Rob Crowell <robccrowell@gmail.com> wrote:
> I suppose it would be possible to make multiple queries, using
> startkey and endkey to pull out the ranges.
>
> 1. Sort the "bad" tags: (BROKEN_IMAGE, OFFENSIVE_IMAGE)
> 2. For each bad tag, request documents:
> i. Query 1:
> startkey = []
> endkey = ["BROKEN_IMAGE"]
>
> ii. Query 2:
> startkey = ["BROKEN_IMAGE", {}]
> endkey = ["OFFENSIVE_IMAGE"]
>
> iii. Query 3:
> startkey = ["OFFENSIVE_IMAGE", {}]
> endkey = [{}]
>
> Requires making N+1 queries, which for a fairly small list wouldn't be too bad.
If you have a view of docs matching a condition, you can find docs
*not* matching that condition efficiently: make simultaneous queries
to _all_docs and your view. Both will be sorted by doc id. Iterate
through both at the same time (no need to storing them in memory),
spotting ids listed in _all_docs but not your view.
I wrote this up here: http://stackoverflow.com/a/6210422/2938
Notes:
* If rows have identical keys, CouchDB sorts them by doc id. You can
emit any value for the rows; what's important here is row.id
* You can generalize the technique to perform multiple "NOT" queries
simultaneously.
* This is a situation where concurrent or event-driven languages like
Javascript or Erlang shine
* I'm pretty sure that in practice, the "NOT" queries add zero cost to
the query. It always takes the same time to complete: the time to
fetch _all_docs.
I do not know if this technique has a name. If it doesn't, may I
propose: "The Thai Massage."
>
> On Wed, Nov 30, 2011 at 3:10 PM, Rob Crowell <robccrowell@gmail.com> wrote:
>> Hey everyone, view question here.
>>
>> I've got couch records that represent images. They may have any
>> number of tags (from zero to hundreds). However, while there are
>> thousands of tags in the dataset, there are only a couple that are
>> considered "bad" (BROKEN_IMAGE, BLANK_IMAGE, etc.) Here's an example
>> document:
>>
>> {
>> _id: ...,
>> url: "http://example.org/whatever.png",
>> tags: ["OUTDOORS", "BEACH", "RED_DRESS"]
>> }
>>
>> I wrote a view to emit documents that don't have these "bad" tags by
>> hard-coding the list of bad tags and checking every tag against this
>> list. If none of the tags are bad, then emit the document.
>>
>> However, a user may also specify tags that he doesn't like
>> (OFFENSIVE_IMAGE, DENVER_BRONCOS, whatever). Is there any good way to
>> build a view around this idea ("show me all documents that don't have
>> a set of tags") short of defining a custom view (with their own "bad"
>> tags list) for every user?
>>
>> I could do this filtering client-side of course, but if I wanted to
>> generate an exhaustive list of matching documents (for a report or
>> something similar) then it would be a lot of work. I'm stumped at the
>> moment. Thanks for any suggestions!
>>
>
--
Iris Couch
|