Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 87F429741 for ; Wed, 11 Jan 2012 12:00:07 +0000 (UTC) Received: (qmail 16948 invoked by uid 500); 11 Jan 2012 11:51:59 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 16816 invoked by uid 500); 11 Jan 2012 11:51:49 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 16650 invoked by uid 99); 11 Jan 2012 11:51:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jan 2012 11:51:43 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dave@muse.net.nz designates 209.85.212.52 as permitted sender) Received: from [209.85.212.52] (HELO mail-vw0-f52.google.com) (209.85.212.52) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jan 2012 11:51:36 +0000 Received: by vbjk17 with SMTP id k17so546667vbj.11 for ; Wed, 11 Jan 2012 03:51:16 -0800 (PST) MIME-Version: 1.0 Received: by 10.52.89.71 with SMTP id bm7mr11148924vdb.41.1326282675990; Wed, 11 Jan 2012 03:51:15 -0800 (PST) Received: by 10.52.30.46 with HTTP; Wed, 11 Jan 2012 03:51:15 -0800 (PST) X-Originating-IP: [84.172.66.115] In-Reply-To: <46A31F28-2CD7-4F44-8A8D-9248C2A47490@aptela.com> References: <20E6547A-058F-4054-B7B1-DED572DCB61E@thenoi.se> <46A31F28-2CD7-4F44-8A8D-9248C2A47490@aptela.com> Date: Wed, 11 Jan 2012 12:51:15 +0100 Message-ID: Subject: Re: Performance of many documents vs large documents From: Dave Cottlehuber To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 11 January 2012 04:07, Mahesh Paolini-Subramanya wro= te: > WIth a (somewhat. =A0kinda. =A0sorta. =A0maybe.) similar requirement, I e= nded up doing this as follows > =A0 =A0 =A0 =A0(1) created a 'daily' database, that data got dumped into = in very small increments - approximately 5 docs/second > =A0 =A0 =A0 =A0(2) uni-directionally replicated the documents out of this= database into a 'reporting' database that I could suck data out of > =A0 =A0 =A0 =A0(3) sucked data out of the reporting database at 15 minute= intervals, processed them somewhat, and dumped all of *those* into one sin= gle (highly sharded) bigcouch db > > The advantages here were > =A0 =A0 =A0 =A0- My data was captured in the format best suited for the d= ata generating events (minimum processing of the event data) thanx to (1) > =A0 =A0 =A0 =A0- The processing of this data did not impact the writing o= f the data thanx to (2) allowing for maximum throughput > =A0 =A0 =A0 =A0- I could compact and archive the 'daily' database every d= ay, thus significantly minimizing disk space thanx to (1). Also, We only re= tain the 'daily' data for 3 months, since anything beyond that is stale (fo= r our purposes. YMMV) > =A0 =A0 =A0 =A0- The collated data that ends up in bigcouch per (3) is mu= ch *much* smaller. But, if we ended up needing a different collation (and y= es, that happens every now and then), I can just rerun the reporting proces= s (up to the last 3 months of course). =A0In fact, I can have multiple coll= ations running in parallel... > > Hope this helps. If you need more info, just ping me... > > Cheers > > Mahesh Paolini-Subramanya > That Tall Bald Indian Guy... > Google+ =A0| Blog =A0 | Twitter > > On Jan 11, 2012, at 4:13 AM, Martin Hewitt wrote: > >> Hi all, >> >> I'm currently scoping a project which will measure a variety of indicato= rs over a long period, and I'm trying to work out where to strike the balan= ce of document number vs document size. >> >> I could have one document per metric, leading to a small number of docum= ents, but with each document containing ticks for every 5-second interval o= f any given day, these documents would quickly become huge. >> >> Clearly, I could decompose these huge per-metric documents down into sma= ller documents, and I'm in the fortunate position that, because I'm dealing= with time, I can decompose by year, months, day, hour, minute or even seco= nd. >> >> Going all the way to second-level would clearly create a huge number of = documents, but all of very small size, so that's the other extreme. >> >> I'm aware the usual response to this is "somewhere in the middle", which= is my working hypothesis (decomposing to a "day" level), but I was wonderi= ng if there was a) anything in CouchDB's architecture that would make one s= ide of the "middle" more suited, or b) if someone has experience architecti= ng something like this. >> >> Any help gratefully appreciated. >> >> Martin > Simon & Mahesh, These examples would be a great addition to the wiki :-)) A+ Dave