Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 67214 invoked from network); 13 Oct 2010 00:16:46 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Oct 2010 00:16:46 -0000 Received: (qmail 57532 invoked by uid 500); 13 Oct 2010 00:16:45 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 57459 invoked by uid 500); 13 Oct 2010 00:16:45 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 57451 invoked by uid 99); 13 Oct 2010 00:16:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Oct 2010 00:16:45 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.208.4.195] (HELO mout.perfora.net) (74.208.4.195) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Oct 2010 00:16:37 +0000 Received: from tracy.dynalias.net (pool-72-71-251-48.cncdnh.fios.myfairpoint.net [72.71.251.48]) by mrelay.perfora.net (node=mrus3) with ESMTP (Nemesis) id 0MHHIp-1OtOYv3qlP-00EGQp; Tue, 12 Oct 2010 20:16:16 -0400 From: Tracy Flynn Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: Volume Test - 2 million documents Date: Tue, 12 Oct 2010 20:16:13 -0400 Message-Id: <3954CD52-F9D4-4172-9301-A37B328D4687@thisonejustforme.com> To: user@couchdb.apache.org Mime-Version: 1.0 (Apple Message framework v1081) X-Mailer: Apple Mail (2.1081) X-Provags-ID: V02:K0:hW0/uuq5vviDJgBE1K5X/cidO3d1WmslgMFxta4DAT/ mdsLiUzq9i/JWy9CKF9gG22Z6/xf/yLJT6wXGnZor+7Kw6iDWM j/Lc+cPwmdI6vqnBE5X3CaJedGuM2YtyOAHks477GCkwf37GOZ LKVux4zbbYKI/LXgSyacD76Ojc8lJxys3kkkiW+uI7QsKk4njh y+Ji+gajcwJD4/+PMVqxsSJJK/6jK2ezADfKtRk7X8= Thanks for all the previous help. For both parts, documents contain about 30 fields of metadata and the = primary content of about 5K to 10K. The desire is to prove out the feasibility of moving all our syndication = services to a common platform that provides rapid customization for = customer-specific syndication feeds. Part 1 -------- I've already done a successful proof-of-concept with 100K documents. No optimization.=20 A couple of things I noticed. Environment my laptop (a recent, loaded MacBook Pro - 2.93 Intel Core 2 = Duo, 8 GB memory) - 100K docs load took about 1 hour - Creating a single view with 'emit([single key],doc]) took about 1 hour - The log indicated view checkpoints every 30 sequence numbers or so. Part 2 ------- I'm about to do a volume test of about 2 million documents - . Primary load ---------------- I will be running in batches of about 1000 documents. Three separate unix servers on a local network: - One for couchdb instance - One for feeder process - One for database View definition ------------------ I have two views defined, without any reduce functions. Questions for Part 2 ------------------------- Firstly any thoughts or hints on my larger benchmark (Part 2) ? Is it naive to hope to speed up the first creation of the view by using = map functions of the form 'emit([key],null)' and then using = 'include_docs' on queries? Is there any way to control the checkpointing of views when creating the = view for the first time - I'm guessing I'm looking at many hours to = create a single view on 2 million documents. Any help would be appreciated. Regards, Tracy