incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Metson <simonmet...@googlemail.com>
Subject Re: couchdb for genome data
Date Thu, 04 Mar 2010 10:49:14 GMT
Hi,
	Why not use a database per experiment? Do you need to process data  
across experiments? Can you store your raw data in individual  
databases and then pull  summary data into a single database?
Cheers
Simon

On 3 Mar 2010, at 22:21, Tom Sante wrote:

> Hi
>
> The data is now stored in a mysql table with about a billion (1000  
> million) rows.
> These rows are the data of a genetic test (arrayCGH) and build up  
> like this:
>
> Every experiment (a few thousand of them total) contains  
> measurements of about 180000 genetic probes. This raw data will be  
> analyzed and the values run through different algorithms, so every  
> probe needs to store more than 1 value after the analysis is done.  
> The values of different analysis are now stored in columns in that  
> table making it a pain if we have to add a analysis to the table not  
> yet part of the existing columns. This is why a schema free document  
> based DB is probably a better fit.
> The initial idea was to give each probe a separate document, and  
> when the original value is transform to an other value store this in  
> the same document.
>
> {
> 	"probe_id" : 1234567890,
> 	"experiment_id" : 1234567890,
> 	"raw_value" : 0.43524,
> 	"analysis": { "cbs" : 0.436, "CBS+GLAD" : 0.4356 }
> }
>
> Once added to the database almost all changes to the data will be  
> contained within an experiment.
>
> MongoDB has something like collections that would be a appropriate  
> abstraction ~ experiment. But in couchdb I would have to add all  
> these probe documents in 1 big database without collections. So if I  
> only make changes to probes within an experiment this would  
> influence the views of all the other billions document in the db.  
> Because of the large number of documents it would be good to know  
> beforehand what the implications are of this performance wise?
>
> Any suggestions are welcome.
>
> Tom


Mime
View raw message