From user-return-11266-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Sun Jul 04 14:10:45 2010 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 40601 invoked from network); 4 Jul 2010 14:10:44 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Jul 2010 14:10:44 -0000 Received: (qmail 52396 invoked by uid 500); 4 Jul 2010 14:10:43 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 52324 invoked by uid 500); 4 Jul 2010 14:10:42 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 52316 invoked by uid 99); 4 Jul 2010 14:10:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Jul 2010 14:10:42 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jchris@gmail.com designates 209.85.210.52 as permitted sender) Received: from [209.85.210.52] (HELO mail-pz0-f52.google.com) (209.85.210.52) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Jul 2010 14:10:36 +0000 Received: by pzk27 with SMTP id 27so171332pzk.11 for ; Sun, 04 Jul 2010 07:10:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:content-type:mime-version :subject:from:in-reply-to:date:content-transfer-encoding:message-id :references:to:x-mailer; bh=rWoXCALsNB9TSnVlqkg/mY0Bc9Z/GXmaOgvQ8RcSu3Q=; b=CTxtX7ao0iF8lwiaHAa8q4sCnNChsVxO2r45YlOBkk/n83RcBWhbEunr47aPKvuENJ dSG+Y2r6snsuLEGOo5sWI/Sfprkkm7GmlFu+TW3mLa/2S/1AMRDYUAE67ImzeS7We84W t8GMfWWEmUODvclh/ioYw/lwCjZVzfYiQCJ/Q= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; b=XfUJ5vzz5sK0mwWYDpjvqE57FXZAWi7eSJvI81VnAgnT1dDrZg9NdWNFwhfUb6Ur6s ycgekvaXmhhnBOim4zGB6diS9Qcmf/3B7Dp4jcUsbWZDQ+mSrqsvKCHu2+kRkmaplyPE tKVEOBLE5rm4CC17fYoGgJOQRXjWbIob75VPg= Received: by 10.142.136.1 with SMTP id j1mr1887366wfd.181.1278252615445; Sun, 04 Jul 2010 07:10:15 -0700 (PDT) Received: from [192.168.1.102] (c-98-248-172-14.hsd1.ca.comcast.net [98.248.172.14]) by mx.google.com with ESMTPS id b9sm3327796rvf.14.2010.07.04.07.10.13 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 04 Jul 2010 07:10:14 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1078) Subject: Re: Why I think view generation should be done concurrent. From: J Chris Anderson In-Reply-To: <1278236217.25065.27.camel@laptop> Date: Sun, 4 Jul 2010 07:10:12 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <1278236217.25065.27.camel@laptop> To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1078) X-Virus-Checked: Checked by ClamAV on apache.org On Jul 4, 2010, at 2:36 AM, Julian Moritz wrote: > Hi, >=20 > a few days ago I've tweeted a wish to have view generation done > concurrent. I'll tell you why (because @janl doesn't think so). >=20 > I've got some documents in the form of: >=20 > _id: 1, > _rev: 3-abc,=20 > url: http://www.abc.com, > hrefs: [http://www.xyz.com,=20 > http://www.nbc.com, > ..., > ..., > ...] >=20 > As you can imagine me crawling the web, I got plenty of them. And = every > second thousands more. I've got a view, map.py is: >=20 > def fun(doc): =20 > h =3D hash > if doc.has_key("hrefs"): > for href in doc["hrefs"]: > yield (h(href), href), None >=20 > reduce.py is: >=20 > def fun(key, value, rereduce): > return True >=20 You should remove this reduce function. It's not doing you any good and = it's burning up your CPU. Things will be much faster without it. Chris > If you're not able to read python code: it's generating a large list = of > unique pseudo-randomly ordered urls. I'm calling this view quite often > (to get new urls to be crawled).=20 >=20 > What is my problem now? My couchdb process is at 100%cpu and the view > needs sometimes quite long to be generated (even if I got only testing > data about 5-10 GB). I've got 4 cores and 3 of them are sleeping. I > think it could be way more faster if every core was used. What does > couchdb do with a very large system, let's say 64 atom cores (which > would be in an idle mode energy saving) and 20TB of data? Using 1 core > with let's say 1ghz to munch down 20TB? Oh please.=20 >=20 > Why doesn't couchdb use all cores to generate views? >=20 > Regards > Julian >=20 > P.S.: Maybe I'm totally wrong and the way you do it is right, but ATM = it > makes me mad to see one core out of four working and the rest is idle. >=20 >=20 >=20 >=20 >=20