Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5109213C9 for ; Fri, 22 Apr 2011 18:40:52 +0000 (UTC) Received: (qmail 24509 invoked by uid 500); 22 Apr 2011 18:40:50 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 24482 invoked by uid 500); 22 Apr 2011 18:40:50 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 24472 invoked by uid 99); 22 Apr 2011 18:40:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Apr 2011 18:40:50 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [128.18.84.114] (HELO mailgate-internal4.sri.com) (128.18.84.114) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 22 Apr 2011 18:40:42 +0000 Received: from brightmail-internal2.sri.com (128.18.84.122) by mailgate-internal4.sri.com with SMTP; 22 Apr 2011 18:40:22 -0000 X-AuditID: 8012547a-b7ccaae000000cc8-ba-4db1cb962e90 Received: from mars.esd.sri.com (mars.esd.sri.com [128.18.26.200]) by brightmail-internal2.sri.com (Symantec Brightmail Gateway) with SMTP id B1.AF.03272.69BC1BD4; Fri, 22 Apr 2011 11:40:22 -0700 (PDT) MIME-version: 1.0 Content-type: multipart/alternative; boundary="Boundary_(ID_5ha58ddfp0rx2+PrJ+Oyjw)" Received: from [192.12.16.187] by mars.esd.sri.com (Sun Java(tm) System Messaging Server 6.3-8.05 (built Sep 1 2009; 64bit)) with ESMTPSA id <0LK2006Z5H72R810@mars.esd.sri.com> for user@couchdb.apache.org; Fri, 22 Apr 2011 11:40:15 -0700 (PDT) From: Jim Klo Subject: Scaling CouchDB Date: Fri, 22 Apr 2011 11:40:21 -0700 Message-id: <527BE431-0D7E-4F6C-8E7F-C3490C28B66A@sri.com> To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1084) X-Brightmail-Tracker: AAAAAA== --Boundary_(ID_5ha58ddfp0rx2+PrJ+Oyjw) Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7BIT I'm part of the core Federal Learning Registry dev team [http://www.learningregistry.org], and we're using CouchDB to store and replicate contents of the registry within our network. One of the questions that has come up as we are starting to make plans for our initial production release is the scalability strategy of CouchDB? We expect long term, we are going to have an enormous amount of data from activity streams and metadata inserted into the network, and I'd like to have an idea what we need to work towards now so theres no big surprise when we start getting close to hitting some limits. As part of our infrastructure strategy - we've chosen Amazon Web Services EC2 & EBS as our hosting provider for the first rollout. EBS currently has an upper limit of 1TB per volume, other cloud or non-cloud solutions may have similar or different limitations, however I'm only concerned right now with how we might deal with this on EC2 and EBS. 1. Are there CouchDB limits that we are going to run into before we hit 1TB? 2. Is there a strategy to for disk spanning to go beyond the 1TB limit by incorporating multiple volumes or do we need to leverage a solution like BigCouch which seems to require us to spin up multiple CouchDB's and do some sort of sharding/partitioning of data? I'm curious on how queries that span shards/partitions works or if this is transparent. Thanks, - Jim Jim Klo Senior Software Engineer Center for Software Engineering SRI International --Boundary_(ID_5ha58ddfp0rx2+PrJ+Oyjw)--