Return-Path: Delivered-To: apmail-incubator-couchdb-user-archive@locus.apache.org Received: (qmail 98741 invoked from network); 3 Nov 2008 14:43:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Nov 2008 14:43:30 -0000 Received: (qmail 68174 invoked by uid 500); 3 Nov 2008 14:43:34 -0000 Delivered-To: apmail-incubator-couchdb-user-archive@incubator.apache.org Received: (qmail 68129 invoked by uid 500); 3 Nov 2008 14:43:34 -0000 Mailing-List: contact couchdb-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-user@incubator.apache.org Delivered-To: mailing list couchdb-user@incubator.apache.org Received: (qmail 68118 invoked by uid 99); 3 Nov 2008 14:43:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Nov 2008 06:43:34 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jonathan.ginter@coradiant.com designates 199.84.5.210 as permitted sender) Received: from [199.84.5.210] (HELO sryulwis0comx01.coradiant.com) (199.84.5.210) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Nov 2008 14:42:19 +0000 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: RE: Largest CouchDB dbs? Date: Mon, 3 Nov 2008 09:42:56 -0500 Message-ID: <0E689907656A4B499E477CBB9A24044707FAF734@sryulwis0comx01.coradiant.com> In-Reply-To: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Largest CouchDB dbs? Thread-Index: Ack9vABQ6NgZ6a7lRnKbeLSfP5B9nAABY9Kw References: <8C1EE3D1-D62F-4C6A-A859-196297E09C3D@apache.org> <0E689907656A4B499E477CBB9A24044707FAF6D0@sryulwis0comx01.coradiant.com> <57E1752F-5513-4F8A-97E0-8D18067D511E@apache.org> <09A61EA1-ECC6-49A5-AFE9-B5F7C48A63B7@apache.org> <0E689907656A4B499E477CBB9A24044707FAF6EE@sryulwis0comx01.coradiant.com> From: "Jonathan Ginter" To: X-Virus-Checked: Checked by ClamAV on apache.org My apologies for the overloaded use of the term "incubation". I realize it has a special meaning for Apache projects. My bad. Thanks for all of quick responses. It's a sign of a well-run project. I will keep my eye on the progress of CouchDB. Hopefully, it will rapidly reach the scalability point that I am looking for. Jonathan -----Original Message----- From: Jan Lehnardt [mailto:jan@apache.org]=20 Sent: Monday, November 03, 2008 8:50 AM To: couchdb-user@incubator.apache.org Subject: Re: Largest CouchDB dbs? On Nov 3, 2008, at 14:40, Jonathan Ginter wrote: > From what I have read, it sounds like the project is not yet ready to > scale this large, but there are plans in place to do so (faster view > parsers, partitioning, etc). Is there a rough target for this =20 > work? We > have a roadmap for upcoming projects and I need to know whether =20 > CouchDB > can be considered for the short term (i.e., within the next 4 - 6 > months) or whether we will have to give it more time to incubate and > come back to it later on in the longer term. No ETA. but feel free to sponsor development :) The two biggest boosts =20 for view generation are (as you correctly identified) JSON serialisation =20 on the Erlang-end and actually making use of MapReduce's parallel nature. At =20 the moment, view creation is single-threaded and limited to a single core =20 on your system. Just to avoid potential misunderstanding: Incubation is the process of becoming an Apache project. It has nothing to do with the software development roadmap. Cheers Jan -- > > > Jonathan > > -----Original Message----- > From: Damien Katz [mailto:damien@apache.org] > Sent: Monday, November 03, 2008 6:00 AM > To: couchdb-user@incubator.apache.org > Subject: Re: Largest CouchDB dbs? > > > On Nov 3, 2008, at 4:38 AM, Jan Lehnardt wrote: > >> >> On Nov 3, 2008, at 05:53, Jonathan Ginter wrote: >> >>> I have a similar issue. I am interested in using CouchDB to host a >>> 200+ GB database that will receive well over 200 million documents >>> per day. Moreover, the data must roll out - i.e., constant >>> background purging - and also support UI queries. And this is just >>> a starting point to match the abilities of the relational database >>> we are already running. I will want the DB to scale up from there. >>> >>> If there is no hope of the CouchDB being able to handle all of that >>> - regardless of how many machines we deploy - I would like to know >>> that now before I look any further into this project. >> >>> Does anyone have a reasonable idea about whether CouchDB will be >>> capable of such massive scalability or how many machines it would >>> take to scale that large? >> >> This sounds like a scenario that CouchDB will ultimately be able to >> handle nicely. I don't think we can give out any guarantees about =20 >> when >> an how this will be the case. Maintaining a 200+GB data set would >> require >> quite some hand-wiring at the moment. >> >> >>> I would appreciate any feedback that anyone might have on this. >> >> I think Damien can chime in here :) Damien? >> > > This is definitely well within what couchdb should be able to do once > partitioning is in place. I'm not really working on this yet, but > there are a lot of people and companies interested in seeing the > partitioning work done. So maybe some progress will be made soon. > > -Damien >