Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: pass (athena.apache.org: domain of b.candler@pobox.com
 designates 208.72.237.25 as permitted sender)
Date: Mon, 9 Feb 2009 15:02:03 +0000
From: Brian Candler <B.Candler@pobox.com>
To: user@couchdb.apache.org
Subject: Re: The Blog
Message-ID: <20090209150203.GA11760@uk.tiscali.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <5186956f0902090449s76befd76g4cf9ded9f59efc86@mail.gmail.com>
User-Agent: Mutt/1.5.17+20080114 (2008-01-14)

> > CouchDB won't allow you to "jump to page X", but if you look at
> > e.g. Google, it doesn't work either. [...]
> > But surrogate keys are considered harmful and I'd say (but that
> > really depends on the application), not very helpful.
> 
> I guess I was assuming that CouchDB, due to its different nature, has
> a sophisticated solution for this. But apparently pagination is a
> problem that is really hard to solve.

It seems to me that CouchDB is at least no worse than an RDBMS here.

Any RDBMS I know builds its indexes using B-trees. So if you do

  SELECT ... FROM db ORDER BY k OFFSET 5000 LIMIT 10

then you're forcing the SQL database to traverse its B-tree index for 5000
entries, then retrieve the next 10, then use those to find the rows you're
interested in.

If my understanding is right, then exactly the same is true of couchdb if
you use the skip/limit options on a view.

Both can use relative paging (e.g. SELECT ... WHERE k >= startkey) if you're
only interested in "next page", "previous page". That's what I'd use for
very large datasets. You can easily do links for the next 10 pages (say), by
selecting more than you need to display in the first page.

However, couchdb offers you a number of options which a SQL database
doesn't. For instance:

1. When you generate your view, you can emit the entire document (or any
selection of fields of special interest) as the value. This means that your
index query which returns 10 keys can return the 10 documents as well; a SQL
database may have to do 10 additional head seeks to return the rows.

There is a tradeoff in index disk space used, of course, but the choice is
up to you.

2. Updating. If you do 500 INSERTs followed by one SELECT in a SQL database,
unless you use some admin-level tricks like temporarily disabling indexing,
all affected indexes will be updated for every INSERT.

With couchdb, you'll get a single update of all indexes when the SELECT
takes place. This may add some latency, but it's far less work than updating
the indexes 500 times.

3. The reduce data structure is extremely smart. If there are N documents
stored in one B-tree node, then the pre-computed reduce value for those N
documents is stored in that node too.

So if you ask for an aggregate value from K1..Kn, and this spans some whole
blocks of B-tree nodes, only the end ones need to be re-reduced:

  ________K1___   _______________  ________________  _____K4_________
          <--->   <------------->  <-------------->  <----->
        reduce       already           already       reduce
          R1'        reduced R2        reduced R3      R4'

and couchdb just calculates reduce(R1',R2,R3,R4') to get the final answer.

In principle, an RDBMS could use the same kind of logic for

    select count(*) from db where k BETWEEN 'k1' AND 'k4'

but I don't know if any of them do. I highly doubt that they do it for
arbitrary aggregation functions like

    select sum(n) from db where k BETWEEN 'k1' AND 'k4'

Couchdb makes this trivial and highly efficient, because you explicitly ask
for which summary values you want to be handled in this way.

The downside, of course, is that you have to *plan* this in couchdb, by
building your views appropriately. A SQL database can take any arbitrary
query, and have a stab at it using whatever combination of index scans and
table reads it thinks is appropriate. But having seen SQL databases make
very bad decisions in this area, I don't consider this something to trumpet
about.

The other downside is when doing joins across multiple 'tables', which in
couchdb would be one document cross-referencing to another. You have to
build your view with multiple rows, one from each document, and combine them
in the client. This isn't particularly hard, but it does negate the reduce
functionality.

> You still need one lookup for every blog entry on a page.
> And there is no way you can ever store the comment count inside the blog
> entry.

I'm not sure what an RDBMS offers here that couchdb does not.

A simple map/reduce query will give you a count of all the comments for a
blog entry (and it will scale to millions of comments).

Sure, you can construct a SQL join which gives you the blog entry plus its
comments count in one go, but the SQL database is doing the same sort of
work behind the scenes.

If you have multiple blog entries on a page, a single couchdb group query
can give you all the comment counts in one go. If the keyspace is contiguous
(e.g. by blog posting date) then it's easy (*). And even if not, you can use
the POST multiple-fetch API to get all the comments counts for an arbitrary
set of blog entries in one request.

But perhaps I'm missing something from your requirements.

Regards,

Brian.

(*) If each comment document has a blog_entry_id, then you can emit
something like

    keys                                 values

    ["2009/02/01/entry1","comment1"]     null
    ["2009/02/01/entry1","comment2"]     null
    ["2009/02/09/entry2","comment3"]     null
    ["2009/02/09/entry2","comment4"]     null
    ["2009/02/09/entry2","comment5"]     null

Use a counter map-reduce function:

    function(ks, vs, co) {
      if (co) {
        return sum(vs);
      } else {
        return vs.length;
      }
    }

For the comment counts for all blog entries this month, ask for

    group=true&group_level=1&startkey=["2009/02/01"]&endkey=["2009/03/01"]

Getting the text of all these blog entries would be a separate query.

I think this shows that in couchdb, there is an advantage to using a doc id
which has relevance to the application.

The SQL normalization brigade would say use a random uuid for every blog
entry and every comment. If you do, I agree that makes it a bit harder to do
this sort of aggregation. But I think the multi-fetch API should still work.