couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Improving pagination for views
Date Tue, 24 Nov 2009 19:59:25 GMT
On Tue, Nov 24, 2009 at 2:33 PM, Dirkjan Ochtman <dirkjan@ochtman.nl> wrote:
> This weekend I tried to implement pagination for one of the modules of
> a Couch-based site I maintain. Here's a rambling account of things
> that I don't like and ideas about how maybe it could be improved.
>
> In short: it kind of sucks. I usually prefer to get a list of pages so
> people can jump around a little bit more, and because it provides
> feedback about the size of an item list (I was working with a photo
> gallery in this case). On MySQL-based databases, I'd usually issue a
> single count query, which could be quite fast, to count the matching
> items, then divide that by page size, use the standard LIMIT/OFFSET
> approach to pagination.
>
> On Couch, as far as I can see, pre-issuing a count query means
> actually retrieving all the row data, which doesn't seem to scale
> nicely.

Or using the standard row count reduce:

function(keys, values, rereduce) {
  if(rereduce) return sum(values);
  return values.length;
}

It occurs to me that we should add a _row_count Erlang reduce function
as its slightly different than _sum.

>
> Apparently, the alternative solution is to page sequentially, meaning
> I retrieve one extra document and pass its document id so that I can
> start the next page from there. That gets me a single next page link
> (works okayish so far). Then, if I also want to provide a previous
> page link, I'd have to either pass current startkey to the next page
> (ugly URLs, more state than I'd like; also, switching on existence of
> parameter instead of just parameter value + default value to check for
> the first page), or retrieve twice the page size + 1.
>

Paging backwards works by using the current first key in the result
set and using reverse=true.

> Conclusion: I'd like to be able to specify to a view, take a little
> bit more time but give me some more metadata to work with.
>
> Currently I get total rows for the view and offset from the first
> index to the current startkey. In the CouchDB book's pagination story,
> that means the offset changes on every page, and using limit means I
> can't really use rows.length to calculate the offset of the end of
> this result set. It would be swell if I could just get the index
> offset for the startkey and the endkey, ignoring skip/limit and
> whatever else. Since it seems retrieval of those index offsets would
> not be too expensive (Jan say O(large m + n)), it seems like a nice
> idea to provide e.g. a meta=true option on views that will do the
> extra work to retrieve those indexes for me and make pagination a
> whole lot easier.

I'm not sure what you mean by expensive. The current method for
start/end keys is to do a tree descent to the start key and then do a
linear seek along the leaves to the end key (but maybe stopping
prematurely if we process Limit rows after we ignore Skip rows).
Getting indexes for start and end keys could very well be done in two
tree descents though. Or you could use a reduce query to get the row
count.

The underlying issue is that the implementation of skip isn't overly
efficient because it requires scanning the index linearly to discard
rows. Similar to why OFFSET in SQL can start to take time when you
specify offsets of > 1M records depending on the query. Strictly
speaking this is only due to the implementation. I wrote an optimized
skip once to see if I understood how the btree worked. It basically
worked like a fast forward by running up and down the tree instead of
seeking linearly. I never got around to testing it thoroughly from a
performance aspect as it was just to prove to myself I understood how
the couch_btree:stream* functions worked. And its not implementable
with grouped reduce queries since we don't persist the reduce results
in a btree.

> Sorry for the rambling, I don't think I can express it any clearer (I
> tried really hard in IRC, and it still took a few hours to get some
> people to understand all this). Does this sound at all sensible?
>
> Cheers,
>
> Dirkjan
>

HTH,
Paul Davis

Mime
View raw message