couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Burton (JIRA)" <>
Subject [jira] Commented: (COUCHDB-707) Proposal for "Filter Views"
Date Thu, 25 Mar 2010 18:46:27 GMT


Luke Burton commented on COUCHDB-707:

Not far, because the objective is to perform the filter on data available in the view, and
only do the expensive fetch of the entire document when the filter criteria is met.

Say I have a Couch database full of images with metadata. My goal is to fetch a bunch of images
that contain a particular tag, from a particular author, of less than a particular focal length.

To do this, I could build a filter view that emits [tags, author, focalLength]. I pass in
my match criteria as HTTP parameters. The filter view would enumerate each row and see if
req.tag is in row.tags, whether =, and whether req.focalLength >
row.focalLength. I could then emit only the complete documents that match.

To do this with a list view, I would need to supply include_docs=true, to get access to the
entire image document so I could actually return it upon a match. This means Couch is retrieving
in memory a potentially multi-gigabyte view document, then handing it off to the list view
javascript for transformation. Expensive! What if only five images actually match? :)

As I mentioned above, you can do all this on the client side - fetch the view, process it,
get a list of IDs, then fetch them - but it requires multiple calls over the wire. And it's
putting what I consider to be "database oriented" stuff into the front end, rather than in
the database itself ...

> Proposal for "Filter Views"
> ---------------------------
>                 Key: COUCHDB-707
>                 URL:
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: JavaScript View Server
>    Affects Versions: 0.11
>            Reporter: Luke Burton
> A common operation I find myself performing repeatedly is:
> * request a view (maybe with some basic filter like "keys" or a range of keys)
> * in my client, filter this view based on some complex criteria, leaving me with a small
set of document IDs (complex as in array intersections, compound boolean operations, &
other stuff not possible in the HTTP view API)
> * go back to Couch and fetch the complete documents for these IDs.
> List Views almost get me to the point of doing this purely in Couch. I can enumerate
over a view and do some complex things with it. But I can't output entire documents, unless
I use the include_docs=true flag which murders the performance of the list view.Apparently
because the entire view is fetched with including docs, THEN passed on to the list view JS.
Typically my complex filter criteria is contained in the view itself, so there is no need
to fetch the entire document until I know I have a match.
> In summary, a Filter View would execute some arbitrary JavaScript on each view row, with
access to HTTP request parameters, and return "true" for rows that match. The output would
be a list of IDs for whom the function returned true. include_docs=true would include the
matching documents.
> Performance would certainly not be as good as fetching a raw view, but it would indisputably
be better than fetching the entire view over HTTP to a client, deserializing the JSON, doing
some stuff, then making another HTTP request, and deserializing more JSON.
> I looked at the various entry points for list views in the Couch source. Unfortunately
it will take me some time to come up to speed with the source (if I ever have the time ...),
and I hope that what I'm asking for could be a simple extension to the List Views for someone
very familiar with this area.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message