couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject View Intersections
Date Tue, 10 Feb 2009 23:59:44 GMT
I've been contemplating implementing a new feature that I've been
wanting for awhile. There's been  some talk of implementing view
intersections for a bit now so I figured I'd try and give a summary of
what the feature would entail in terms of functionality and then the
necessary bits required for an implementation.

So the original idea for view intersections was exactly what the name
entails: Show me the intersection between two views for a given set of
view query parameters. After thinking about different methods of
implementation I think we can extend this to be more powerful and
generally applicable.

Major Hurdle 1

The first necessary bit of ground work would be to implement an
optional value index on views. The more I thought about intersecting
views the more I realized it was starting to look pointless. Ignoring
something along the lines of group_level=N in that we can join on
array prefixes, all views being joined would require exactly the same
key. Which begs the question, why not just create 1 view that emits
the output of the two you want intersected.

I couldn't get past this for a long time until I heard Chris Anderson
pondering adding a btree index to the values in a view. The obvious
downfalls of the extra space and computation usage are there, but
making it optional should solve any qualms in that respect.

Given an index on a value we're now able to chain together arbitrary
views using either the key or value as well as limit the intersection
by any combination of key and value.

As a side benefit, we would also get the select views by value
restriction as well. I'm thinking it'd be as transparent as adding a
[start|end]value and [start|end]value_docid set of URL parameters. I
haven't followed this train of thought too far into the code yet, but
something approximating that should be fairly doable. A thought occurs
that limiting view results by both key and value could be interesting
in terms of implementation. Not sure if I'd force it through the
intersection API or not.

Caveats that come to mind are that this would break binary
compatibility for all generated views. It wouldn't require a
dump/reload, but it might come as a surprise to people upgrading that
all their views are regenerating.

Major Hurdle 2

Implementing the view intersection API. First off, it probably needs a
new name. Once we have intersections working, unions, subtractions,
and the NxM one who's name escapes me (cross product floats up but
sounds not right) should be trivially implementable.

The underlying implementation for this is basically a large merge sort
running over the view btree's. If you read about the merge step in
map/reduce/merge that's basically what I've got in my head.

The biggest issue that I've found in getting this implemented
(excluding a value index) is that I'd need to write a new btree
traversal method that used iterators instead of a fold mechanism. This
shouldn't be overly difficult to implement.

Beyond that then it's basically up to the HTTP interface in parameter
parsing and error checking. For passing parameters I'm thinking along
the line of a JSON body posted (Premptive: any RESTafarians should
reference the long discussion on multi-get before writing about how
this isn't RESTful).

Also, not sure if it's obvious but I'd plan on allowing arbitrarily
nested conditions, ie, "intersection(union(a, b), c)" type of
operations. There's a subtle detail in the sort order and thus
corresponding btree traversal that might come into play there. I can
punt and make the entire request use one sort order, as in the
previous example can't specify different sort directions for the two
nested operations because you'd get a (presumably) zero overlap in the
end. I'm pretty sure if we force all btrees to be traversed in the
same direction for each request we don't lose any functionality


That's the general outline I've got in my head right now. I'm pretty
sure I can see 95% of the implementation, but it's possible I'm
missing a finer detail somewhere. If you've got questions or comments
let's hear them. If there's no general objection then I can probably
get to starting an implementation at the end of this week.

Paul Davis

View raw message