Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AE6A917F31 for ; Tue, 28 Oct 2014 17:57:17 +0000 (UTC) Received: (qmail 18817 invoked by uid 500); 28 Oct 2014 17:57:16 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 18756 invoked by uid 500); 28 Oct 2014 17:57:16 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 18740 invoked by uid 99); 28 Oct 2014 17:57:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Oct 2014 17:57:16 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tciuro@mac.com designates 17.110.78.41 as permitted sender) Received: from [17.110.78.41] (HELO mr11p24im-asmtp001.me.com) (17.110.78.41) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Oct 2014 17:56:47 +0000 Received: from ipanema.apple.com (unknown [17.212.154.21]) by mr11p24im-asmtp001.me.com (Oracle Communications Messaging Server 7u4-27.10(7.0.4.27.9) 64bit (built Jun 6 2014)) with ESMTPSA id <0NE6009TQ1UIHB10@mr11p24im-asmtp001.me.com> for user@couchdb.apache.org; Tue, 28 Oct 2014 17:56:45 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.12.52,1.0.28,0.0.0000 definitions=2014-10-28_08:2014-10-28,2014-10-28,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1408290000 definitions=main-1410280161 Content-type: multipart/alternative; boundary="Apple-Mail=_FEBE419F-7673-445D-9B25-A14FE82DDEA9" MIME-version: 1.0 (Mac OS X Mail 8.0 \(1990.1\)) Subject: Re: How does indexing really work? From: Tito Ciuro In-reply-to: <4684600.223.1414518611732.JavaMail.joant@Joans-MacBook-Pro.local> Date: Tue, 28 Oct 2014 10:56:42 -0700 Message-id: <7B2C6344-BAC1-4A3D-BD00-70CD3758EA5D@mac.com> References: <4684600.223.1414518611732.JavaMail.joant@Joans-MacBook-Pro.local> To: user@couchdb.apache.org, Joan Touzet X-Mailer: Apple Mail (2.1990.1) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_FEBE419F-7673-445D-9B25-A14FE82DDEA9 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hello Joan, I=E2=80=99m getting this information in two places: - Futon=E2=80=99s =E2=80=9CStatus=E2=80=9D page - CouchDB=E2=80=99s /_active_tasks payload I know there are ~950,000 documents in the database. This numbers = appears in the /_utils main page. What I don=E2=80=99t understand is why = the total number of documents differ when the active tasks are reported = via Status page or /_active_tasks. Each active task has a different = total number of docs to be processed. Yes, Genesis case is the initial case where CouchDB hasn=E2=80=99t had = the opportunity to index anything. Thanks, =E2=80=94 Tito > On Oct 28, 2014, at 10:50 AM, Joan Touzet wrote: >=20 > Hi Tito, >=20 > Can you explain where you're getting the "total count" from? Is this = the total number of rows emitted by each view after all views have = finished processing? >=20 > What do you mean by "Genesis case" - do you mean building a view for = the first time? >=20 > Thanks, > Joan >=20 > ----- Original Message ----- > From: "Tito Ciuro" > > To: user@couchdb.apache.org > Sent: Tuesday, October 28, 2014 1:32:37 PM > Subject: How does indexing really work? >=20 > Hello, >=20 > I=E2=80=99m a bit confused about how CouchDB really works. I just = launched Futon and see that the indexer is busy working on a design = document. I have almost a million documents. >=20 > A few minutes later, I see three more tasks appearing, all belonging = to different design documents. No problem, except that the total count = is all different: >=20 > - design doc 1: ~950,000 > - design doc 2: ~450,000 > - design doc 3: ~313,000 > - design doc 4: ~85,000 >=20 > Why are the total counts different? My understanding is/was that a = database holds N documents. Each indexing function is passed a document = which then gets compares whether it=E2=80=99s the doc_type it expects: >=20 > function(doc) { > = >if (doc.Type =3D=3D "customer") { > = >emit(doc._id, {LastName: = doc.LastName, FirstName: doc.FirstName}); > = >} > } >=20 > In the Genesis case, I was assuming that each view would have to go = through each document across the database and index its own doc_type. = Basically, one round for each design document for N total documents. For = example, if the database contains 100,000 documents and two design = documents, there would be two active tasks listed: >=20 > - _design/customers =3D> index 100,000 documents > - _design/orders =3D> index 100,000 documents >=20 > Later on, the indexing would be partial and the delta (say 9,000 docs) = would have to be reindexed by each view: >=20 > - _design/customers =3D> index 9,000 documents > - _design/orders =3D> index 9,000 documents >=20 > This doesn=E2=80=99t seem to be the case. I=E2=80=99d love to know how = indexing really works. >=20 > Thanks! >=20 > =E2=80=94 Tito --Apple-Mail=_FEBE419F-7673-445D-9B25-A14FE82DDEA9--