couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Candler <B.Cand...@pobox.com>
Subject Re: 'Grouping' documents so that a set of documents is passed to the view function
Date Fri, 26 Jun 2009 09:22:21 GMT
On Fri, Jun 26, 2009 at 05:01:37PM +0800, hhsuper wrote:
>    by the way caculated in reduce function, can you give a example how to
>    impl sorting by any of the col1/col2/col3/v1/v2 column value with
>    couchdb (multi-view also accepted), because the sorting is based the
>    all data

Sorting is done by map views:

  emit(some_value, ...);

When you access the view, it is sorted by key. If you want to sort by
different columns, then either have multiple views, or a single tagged view:

  emit(["by_foo", doc.foo||null], null);
  emit(["by_bar", doc.bar||null], null);
  emit(["by_baz", doc.baz||null], null);

If you want to select a subset of data (based on attribute foo), and then sort
by attribute bar, use compound keys:

  emit([doc.foo||null, doc.bar||null]);

Then access using startkey=["foo1"]&endkey=["foo1",{}]. This is basically
the same as composite indexes in an RDBMS, although more flexible because
you can use derived data for the indexes, rather than just the raw data from
your 'columns'.

If there is filtering that can't be done in a map function, then you do it
client side. Ditto for sorting after the filtering.

Reduce views by default return only one value, so sorting doesn't mean
anything. But if you do a grouped reduce view, then the values are sorted by
the group key.

>    when with million records it's unable to load them to client
>    and then sorting, so the couchdb level sorting is needed to me.

Is it not just as unacceptable to ask your RDBMS to fetch one million
records and sort them before returning a handful of them to the client? I
think you are just hiding your problem. Maybe your RDBMS is clever enough to
keep the one million sorted records cached ready for the next query for the
next handful of records... but maybe it is not.

This, in my opinion, is one of the best things about CouchDB. It forces you
to be explicit about your indexing and searching algorithms, and exposes the
weak points for you.

Anyway, unless you are experiencing a problem right now, why worry about it?
Remember "You Ain't Gonna Need It" and "Do The Simplest Thing Which Can
Possibly Work". Build your app and see how it performs. If you reach a
scaling problem which you absolutely cannot deal with under CouchDB, it's
easy to export the data and move it into something else. Or you can take
periodic snapshots of your data and load them into some other app, whilst
keeping CouchDB for loading, editing and searching.

>    also paging is the same thing, when i impl sorting/paging  together in
>    couchdb level, i need consider the startkey for descending param, it's
>    especially difficult (i impl paging with couchdb's startkey and limit
>    param now).

That's basically the right way to do paging. It's easy to do "jump to next
page" and "jump to previous page". If you really want to do "skip to page N"
then you can use skip and limit instead, which is less efficient as you get
further away from the start. This is exactly the same inefficiency you would
get using offset and limit in a SQL query, of course. In other words:
CouchDB exposes a highly efficient way to do paging which SQL doesn't offer.

Anyway, I think I've reached my limit on this conversation. If you are
convinced that CouchDB is not what you want, then please go ahead and use
something else, rather than trying to convince us why CouchDB is wrong for
your application :-)

Regards,

Brian.

Mime
View raw message