couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joan Touzet <woh...@apache.org>
Subject Re: How does indexing really work?
Date Tue, 28 Oct 2014 17:50:15 GMT
Hi Tito,

Can you explain where you're getting the "total count" from? Is this the total number of rows
emitted by each view after all views have finished processing?

What do you mean by "Genesis case" - do you mean building a view for the first time?

Thanks,
Joan

----- Original Message -----
From: "Tito Ciuro" <tciuro@mac.com>
To: user@couchdb.apache.org
Sent: Tuesday, October 28, 2014 1:32:37 PM
Subject: How does indexing really work?

Hello,

I’m a bit confused about how CouchDB really works. I just launched Futon and see that the
indexer is busy working on a design document. I have almost a million documents.

A few minutes later, I see three more tasks appearing, all belonging to different design documents.
No problem, except that the total count is all different:

- design doc 1: ~950,000
- design doc 2: ~450,000
- design doc 3: ~313,000
- design doc 4: ~85,000

Why are the total counts different? My understanding is/was that a database holds N documents.
Each indexing function is passed a document which then gets compares whether it’s the doc_type
it expects:

function(doc) {
    <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_2>if
(doc.Type == "customer") {
    <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_3>emit(doc._id,
{LastName: doc.LastName, FirstName: doc.FirstName});
    <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_4>}
}

In the Genesis case, I was assuming that each view would have to go through each document
across the database and index its own doc_type. Basically, one round for each design document
for N total documents. For example, if the database contains 100,000 documents and two design
documents, there would be two active tasks listed:

- _design/customers => index 100,000 documents
- _design/orders => index 100,000 documents

Later on, the indexing would be partial and the delta (say 9,000 docs) would have to be reindexed
by each view:

- _design/customers => index 9,000 documents
- _design/orders => index 9,000 documents

This doesn’t seem to be the case. I’d love to know how indexing really works.

Thanks!

— Tito

Mime
View raw message