Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 95585 invoked from network); 4 Mar 2010 11:24:23 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Mar 2010 11:24:23 -0000 Received: (qmail 49648 invoked by uid 500); 4 Mar 2010 11:24:12 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 49584 invoked by uid 500); 4 Mar 2010 11:24:12 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 49576 invoked by uid 99); 4 Mar 2010 11:24:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Mar 2010 11:24:12 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of akat.metin@gmail.com designates 209.85.219.211 as permitted sender) Received: from [209.85.219.211] (HELO mail-ew0-f211.google.com) (209.85.219.211) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Mar 2010 11:24:04 +0000 Received: by ewy3 with SMTP id 3so1610617ewy.35 for ; Thu, 04 Mar 2010 03:23:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=5KWp+5nPEj3iAZnNCbaV9AhJnCA5nsH2y1bATfkPH5A=; b=KnYNzrqXU0N3rWYaHiJ+2y2aErMNT2qiq2jdWweWdjKSxQ5h+op3XPen17xw95H0jE R53X8DZSoxXMFM04y+hP+N7fhMnRGW0roBOG+qfNZPN9ScEe5MbHGcpqDexqoYRH888I +LtUYW8/ZCwHuENl6YhPzb2YnwoitxjuZ2C0g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=iMTVNRcrpo1h25pZ1TfG82W1zzRxweu2RCngYGNRKXn4chDo7mdELvdyuqppWHli21 SWJeWlSDvrX3yOFc6u9V3ofGDFNORC2VFN+vgBWdMJA5MQx8Y7TgW4kUFIo+dh8ttQYb 4nJJpOaOVcfoZeoj6zXurw1L7NeaEfrGtVgX4= MIME-Version: 1.0 Received: by 10.213.109.199 with SMTP id k7mr462789ebp.66.1267701823857; Thu, 04 Mar 2010 03:23:43 -0800 (PST) In-Reply-To: <4B8F971A.2080909@gmail.com> References: <4B8EE0DE.6000205@gmail.com> <3285B6DD-83C4-4F97-8CA8-DBDFFBC45B45@googlemail.com> <4B8F971A.2080909@gmail.com> Date: Thu, 4 Mar 2010 13:23:43 +0200 Message-ID: <98a246a11003040323m49223f22i6d6d2310ca04944f@mail.gmail.com> Subject: Re: couchdb for genome data From: Metin Akat To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org No, there is no limit. And you can structure them in subdirectories by using a slash in the database name. Like "your/name/here" will create 2 subdirectories with a database called "here" at the bottom. On Thu, Mar 4, 2010 at 1:18 PM, Tom Sante wrote: > Thanks. Are there any limits to the number of databases in couchdb? A few > 1000 probably won't be a problem I guess? > > On 4/03/10 11:49, Simon Metson wrote: >> >> Hi, >> Why not use a database per experiment? Do you need to process data >> across experiments? Can you store your raw data in individual databases >> and then pull summary data into a single database? >> Cheers >> Simon >> >> On 3 Mar 2010, at 22:21, Tom Sante wrote: >> >>> Hi >>> >>> The data is now stored in a mysql table with about a billion (1000 >>> million) rows. >>> These rows are the data of a genetic test (arrayCGH) and build up like >>> this: >>> >>> Every experiment (a few thousand of them total) contains measurements >>> of about 180000 genetic probes. This raw data will be analyzed and the >>> values run through different algorithms, so every probe needs to store >>> more than 1 value after the analysis is done. The values of different >>> analysis are now stored in columns in that table making it a pain if >>> we have to add a analysis to the table not yet part of the existing >>> columns. This is why a schema free document based DB is probably a >>> better fit. >>> The initial idea was to give each probe a separate document, and when >>> the original value is transform to an other value store this in the >>> same document. >>> >>> { >>> "probe_id" : 1234567890, >>> "experiment_id" : 1234567890, >>> "raw_value" : 0.43524, >>> "analysis": { "cbs" : 0.436, "CBS+GLAD" : 0.4356 } >>> } >>> >>> Once added to the database almost all changes to the data will be >>> contained within an experiment. >>> >>> MongoDB has something like collections that would be a appropriate >>> abstraction ~ experiment. But in couchdb I would have to add all these >>> probe documents in 1 big database without collections. So if I only >>> make changes to probes within an experiment this would influence the >>> views of all the other billions document in the db. Because of the >>> large number of documents it would be good to know beforehand what the >>> implications are of this performance wise? >>> >>> Any suggestions are welcome. >>> >>> Tom >> > >