couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Hinrichs - DM&T" <dunde...@gmail.com>
Subject Re: View Intersections
Date Wed, 11 Feb 2009 03:19:28 GMT
Just a few  comment to get things started.

On Tue, Feb 10, 2009 at 5:59 PM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
> I've been contemplating implementing a new feature that I've been
> wanting for awhile. There's been  some talk of implementing view
> intersections for a bit now so I figured I'd try and give a summary of
> what the feature would entail in terms of functionality and then the
> necessary bits required for an implementation.
>
> So the original idea for view intersections was exactly what the name
> entails: Show me the intersection between two views for a given set of
> view query parameters. After thinking about different methods of
> implementation I think we can extend this to be more powerful and
> generally applicable.
>
> Major Hurdle 1
> ============
>
> The first necessary bit of ground work would be to implement an
> optional value index on views. The more I thought about intersecting
> views the more I realized it was starting to look pointless. Ignoring
> something along the lines of group_level=N in that we can join on
> array prefixes, all views being joined would require exactly the same
> key. Which begs the question, why not just create 1 view that emits
> the output of the two you want intersected.

 I would argue that returning a simple list of docids that meet the
requirement should suffice -- in fact, the views a and b need not be
homogenous so returning anything beyond docids could end up being a
bigger problem than the intersection itself.

For instance, say we want the intersection of the documents who have
both "blue" and "fuzzy" tags so we use
a = /_view/tags/byval?key="blue"
b = /_view/tags/byval?key="fuzzy"

intersection(a,b)

Now we want to limit that to things named Harold.

c=/_view/name/first?key="Harold"

intersection(intersection(a,b),c)

Which gives us a list of docid's that contain Blue, Fuzzy things named Harold.

However, the values returned by view a and view b are the same,
however the values returned by view c might be completely different.
So returning a view with varying values might not be very helpful
(This is where I am not seeing why more than returning a list of
docid's would be appropriate. Of course I am most likely missing the
point.)  Only returning intersections of similar views would not be as
interesting a returning intersections of dissimilar views.


>
> I couldn't get past this for a long time until I heard Chris Anderson
> pondering adding a btree index to the values in a view. The obvious
> downfalls of the extra space and computation usage are there, but
> making it optional should solve any qualms in that respect.
>
> Given an index on a value we're now able to chain together arbitrary
> views using either the key or value as well as limit the intersection
> by any combination of key and value.
>
> As a side benefit, we would also get the select views by value
> restriction as well. I'm thinking it'd be as transparent as adding a
> [start|end]value and [start|end]value_docid set of URL parameters. I
> haven't followed this train of thought too far into the code yet, but
> something approximating that should be fairly doable. A thought occurs
> that limiting view results by both key and value could be interesting
> in terms of implementation. Not sure if I'd force it through the
> intersection API or not.
>
> Caveats that come to mind are that this would break binary
> compatibility for all generated views. It wouldn't require a
> dump/reload, but it might come as a surprise to people upgrading that
> all their views are regenerating.
>
> Major Hurdle 2
> ============
>
> Implementing the view intersection API. First off, it probably needs a
> new name. Once we have intersections working, unions, subtractions,
> and the NxM one who's name escapes me (cross product floats up but
> sounds not right) should be trivially implementable.
>
> The underlying implementation for this is basically a large merge sort
> running over the view btree's. If you read about the merge step in
> map/reduce/merge that's basically what I've got in my head.
>
> The biggest issue that I've found in getting this implemented
> (excluding a value index) is that I'd need to write a new btree
> traversal method that used iterators instead of a fold mechanism. This
> shouldn't be overly difficult to implement.
>
> Beyond that then it's basically up to the HTTP interface in parameter
> parsing and error checking. For passing parameters I'm thinking along
> the line of a JSON body posted (Premptive: any RESTafarians should
> reference the long discussion on multi-get before writing about how
> this isn't RESTful).

 posting json documents seems to be required and beyond argument given
the technical size limits of a GET request

>
> Also, not sure if it's obvious but I'd plan on allowing arbitrarily
> nested conditions, ie, "intersection(union(a, b), c)" type of
> operations. There's a subtle detail in the sort order and thus
> corresponding btree traversal that might come into play there. I can
> punt and make the entire request use one sort order, as in the
> previous example can't specify different sort directions for the two
> nested operations because you'd get a (presumably) zero overlap in the
> end. I'm pretty sure if we force all btrees to be traversed in the
> same direction for each request we don't lose any functionality
> though.
3) Since it is a set operation, as long as operator precedence is
observed, the order of processing or traversing an index within the
set should make no difference to the result.  The intersection of the
sets {1, 2, 3} and {2, 3, 4} == {3, 1, 2} and {4, 2, 3}  == {2, 3} ==
{3, 2}

> Comments
> =========
>
> That's the general outline I've got in my head right now. I'm pretty
> sure I can see 95% of the implementation, but it's possible I'm
> missing a finer detail somewhere. If you've got questions or comments
> let's hear them. If there's no general objection then I can probably
> get to starting an implementation at the end of this week.
>
> Thanks,
> Paul Davis
>

Hat in hand,

Jeff

Mime
View raw message