incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Sante <tom.sa...@gmail.com>
Subject Re: couchdb for genome data
Date Thu, 04 Mar 2010 11:18:50 GMT
Thanks. Are there any limits to the number of databases in couchdb? A 
few 1000 probably won't be a problem I guess?

On 4/03/10 11:49, Simon Metson wrote:
> Hi,
> Why not use a database per experiment? Do you need to process data
> across experiments? Can you store your raw data in individual databases
> and then pull summary data into a single database?
> Cheers
> Simon
>
> On 3 Mar 2010, at 22:21, Tom Sante wrote:
>
>> Hi
>>
>> The data is now stored in a mysql table with about a billion (1000
>> million) rows.
>> These rows are the data of a genetic test (arrayCGH) and build up like
>> this:
>>
>> Every experiment (a few thousand of them total) contains measurements
>> of about 180000 genetic probes. This raw data will be analyzed and the
>> values run through different algorithms, so every probe needs to store
>> more than 1 value after the analysis is done. The values of different
>> analysis are now stored in columns in that table making it a pain if
>> we have to add a analysis to the table not yet part of the existing
>> columns. This is why a schema free document based DB is probably a
>> better fit.
>> The initial idea was to give each probe a separate document, and when
>> the original value is transform to an other value store this in the
>> same document.
>>
>> {
>> "probe_id" : 1234567890,
>> "experiment_id" : 1234567890,
>> "raw_value" : 0.43524,
>> "analysis": { "cbs" : 0.436, "CBS+GLAD" : 0.4356 }
>> }
>>
>> Once added to the database almost all changes to the data will be
>> contained within an experiment.
>>
>> MongoDB has something like collections that would be a appropriate
>> abstraction ~ experiment. But in couchdb I would have to add all these
>> probe documents in 1 big database without collections. So if I only
>> make changes to probes within an experiment this would influence the
>> views of all the other billions document in the db. Because of the
>> large number of documents it would be good to know beforehand what the
>> implications are of this performance wise?
>>
>> Any suggestions are welcome.
>>
>> Tom
>


Mime
View raw message