couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tito Ciuro <tci...@mac.com>
Subject Re: How does indexing really work?
Date Tue, 28 Oct 2014 17:56:42 GMT
Hello Joan,

I’m getting this information in two places:

- Futon’s “Status” page
- CouchDB’s /_active_tasks payload

I know there are ~950,000 documents in the database. This numbers appears in the /_utils main
page. What I don’t understand is why the total number of documents differ when the active
tasks are reported via Status page or /_active_tasks. Each active task has a different total
number of docs to be processed.

Yes, Genesis case is the initial case where CouchDB hasn’t had the opportunity to index
anything.

Thanks,

— Tito

> On Oct 28, 2014, at 10:50 AM, Joan Touzet <wohali@apache.org> wrote:
> 
> Hi Tito,
> 
> Can you explain where you're getting the "total count" from? Is this the total number
of rows emitted by each view after all views have finished processing?
> 
> What do you mean by "Genesis case" - do you mean building a view for the first time?
> 
> Thanks,
> Joan
> 
> ----- Original Message -----
> From: "Tito Ciuro" <tciuro@mac.com <mailto:tciuro@mac.com>>
> To: user@couchdb.apache.org <mailto:user@couchdb.apache.org>
> Sent: Tuesday, October 28, 2014 1:32:37 PM
> Subject: How does indexing really work?
> 
> Hello,
> 
> I’m a bit confused about how CouchDB really works. I just launched Futon and see that
the indexer is busy working on a design document. I have almost a million documents.
> 
> A few minutes later, I see three more tasks appearing, all belonging to different design
documents. No problem, except that the total count is all different:
> 
> - design doc 1: ~950,000
> - design doc 2: ~450,000
> - design doc 3: ~313,000
> - design doc 4: ~85,000
> 
> Why are the total counts different? My understanding is/was that a database holds N documents.
Each indexing function is passed a document which then gets compares whether it’s the doc_type
it expects:
> 
> function(doc) {
>    <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_2
<http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_2>>if
(doc.Type == "customer") {
>    <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_3
<http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_3>>emit(doc._id,
{LastName: doc.LastName, FirstName: doc.FirstName});
>    <http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_4
<http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_4>>}
> }
> 
> In the Genesis case, I was assuming that each view would have to go through each document
across the database and index its own doc_type. Basically, one round for each design document
for N total documents. For example, if the database contains 100,000 documents and two design
documents, there would be two active tasks listed:
> 
> - _design/customers => index 100,000 documents
> - _design/orders => index 100,000 documents
> 
> Later on, the indexing would be partial and the delta (say 9,000 docs) would have to
be reindexed by each view:
> 
> - _design/customers => index 9,000 documents
> - _design/orders => index 9,000 documents
> 
> This doesn’t seem to be the case. I’d love to know how indexing really works.
> 
> Thanks!
> 
> — Tito


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message