couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Metin Akat <akat.me...@gmail.com>
Subject Re: couchdb for genome data
Date Thu, 04 Mar 2010 11:23:43 GMT
No, there is no limit.
And you can structure them in subdirectories by using a slash in the
database name.
Like "your/name/here"   will create 2 subdirectories with a database
called "here" at the bottom.

On Thu, Mar 4, 2010 at 1:18 PM, Tom Sante <tom.sante@gmail.com> wrote:
> Thanks. Are there any limits to the number of databases in couchdb? A few
> 1000 probably won't be a problem I guess?
>
> On 4/03/10 11:49, Simon Metson wrote:
>>
>> Hi,
>> Why not use a database per experiment? Do you need to process data
>> across experiments? Can you store your raw data in individual databases
>> and then pull summary data into a single database?
>> Cheers
>> Simon
>>
>> On 3 Mar 2010, at 22:21, Tom Sante wrote:
>>
>>> Hi
>>>
>>> The data is now stored in a mysql table with about a billion (1000
>>> million) rows.
>>> These rows are the data of a genetic test (arrayCGH) and build up like
>>> this:
>>>
>>> Every experiment (a few thousand of them total) contains measurements
>>> of about 180000 genetic probes. This raw data will be analyzed and the
>>> values run through different algorithms, so every probe needs to store
>>> more than 1 value after the analysis is done. The values of different
>>> analysis are now stored in columns in that table making it a pain if
>>> we have to add a analysis to the table not yet part of the existing
>>> columns. This is why a schema free document based DB is probably a
>>> better fit.
>>> The initial idea was to give each probe a separate document, and when
>>> the original value is transform to an other value store this in the
>>> same document.
>>>
>>> {
>>> "probe_id" : 1234567890,
>>> "experiment_id" : 1234567890,
>>> "raw_value" : 0.43524,
>>> "analysis": { "cbs" : 0.436, "CBS+GLAD" : 0.4356 }
>>> }
>>>
>>> Once added to the database almost all changes to the data will be
>>> contained within an experiment.
>>>
>>> MongoDB has something like collections that would be a appropriate
>>> abstraction ~ experiment. But in couchdb I would have to add all these
>>> probe documents in 1 big database without collections. So if I only
>>> make changes to probes within an experiment this would influence the
>>> views of all the other billions document in the db. Because of the
>>> large number of documents it would be good to know beforehand what the
>>> implications are of this performance wise?
>>>
>>> Any suggestions are welcome.
>>>
>>> Tom
>>
>
>

Mime
View raw message