Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 83398 invoked from network); 4 Jul 2010 09:38:04 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Jul 2010 09:38:04 -0000 Received: (qmail 51326 invoked by uid 500); 4 Jul 2010 09:38:02 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 50955 invoked by uid 500); 4 Jul 2010 09:38:00 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 50947 invoked by uid 99); 4 Jul 2010 09:37:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Jul 2010 09:37:59 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [62.146.15.6] (HELO mars.a1a-server.de) (62.146.15.6) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Jul 2010 09:37:52 +0000 Received: from localhost (localhost [127.0.0.1]) by mars.a1a-server.de (Postfix) with ESMTP id 4F94BE1A95 for ; Sun, 4 Jul 2010 11:37:02 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mars.a1a-server.de Received: from mars.a1a-server.de ([127.0.0.1]) by localhost (mars.a1a-server.de [127.0.0.1]) (amavisd-new, port 10024) with LMTP id zFZfSW-NLo6h for ; Sun, 4 Jul 2010 11:36:59 +0200 (CEST) Received: from [192.168.178.36] (dslb-188-102-198-125.pools.arcor-ip.net [188.102.198.125]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mars.a1a-server.de (Postfix) with ESMTPSA id 0EB62E1332 for ; Sun, 4 Jul 2010 11:36:59 +0200 (CEST) X-DKIM: Sendmail DKIM Filter v2.6.0 mars.a1a-server.de 0EB62E1332 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=a1a-server.de; s=mars; t=1278236219; bh=dhY/KWgqnkL7sxFyjx7pEVR2FBFv0Z3kw73jdVuCfJA=; h=Subject:From:To:Content-Type:Date:Message-ID:Mime-Version: Content-Transfer-Encoding; b=jTA4GvRte7Sxgk8AUDt7s3omVLLfn4rjXDE2q pgDDCl5AiSu2Gh6nowAcGi4r8/AF+xbeO/S7tOalo5iaAHh5PKbNShC9JEn8YxMgQ5c ssR1Oos2e0GtkcJlvxlmHkFAgUChq62DxMUmifUUXnvoLzPwGwKk710+DDaHCxsGn6g = Subject: Why I think view generation should be done concurrent. From: Julian Moritz To: user@couchdb.apache.org Content-Type: text/plain; charset="UTF-8" Date: Sun, 04 Jul 2010 11:36:57 +0200 Message-ID: <1278236217.25065.27.camel@laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, a few days ago I've tweeted a wish to have view generation done concurrent. I'll tell you why (because @janl doesn't think so). I've got some documents in the form of: _id: 1, _rev: 3-abc, url: http://www.abc.com, hrefs: [http://www.xyz.com, http://www.nbc.com, ..., ..., ...] As you can imagine me crawling the web, I got plenty of them. And every second thousands more. I've got a view, map.py is: def fun(doc): h = hash if doc.has_key("hrefs"): for href in doc["hrefs"]: yield (h(href), href), None reduce.py is: def fun(key, value, rereduce): return True If you're not able to read python code: it's generating a large list of unique pseudo-randomly ordered urls. I'm calling this view quite often (to get new urls to be crawled). What is my problem now? My couchdb process is at 100%cpu and the view needs sometimes quite long to be generated (even if I got only testing data about 5-10 GB). I've got 4 cores and 3 of them are sleeping. I think it could be way more faster if every core was used. What does couchdb do with a very large system, let's say 64 atom cores (which would be in an idle mode energy saving) and 20TB of data? Using 1 core with let's say 1ghz to munch down 20TB? Oh please. Why doesn't couchdb use all cores to generate views? Regards Julian P.S.: Maybe I'm totally wrong and the way you do it is right, but ATM it makes me mad to see one core out of four working and the rest is idle.