incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Sante <tom.sa...@gmail.com>
Subject couchdb for genome data
Date Wed, 03 Mar 2010 22:21:18 GMT
Hi

The data is now stored in a mysql table with about a billion (1000 
million) rows.
These rows are the data of a genetic test (arrayCGH) and build up like this:

Every experiment (a few thousand of them total) contains measurements of 
about 180000 genetic probes. This raw data will be analyzed and the 
values run through different algorithms, so every probe needs to store 
more than 1 value after the analysis is done. The values of different 
analysis are now stored in columns in that table making it a pain if we 
have to add a analysis to the table not yet part of the existing 
columns. This is why a schema free document based DB is probably a 
better fit.
The initial idea was to give each probe a separate document, and when 
the original value is transform to an other value store this in the same 
document.

{
	"probe_id" : 1234567890,
	"experiment_id" : 1234567890,
	"raw_value" : 0.43524,
	"analysis": { "cbs" : 0.436, "CBS+GLAD" : 0.4356 }
}

Once added to the database almost all changes to the data will be 
contained within an experiment.

MongoDB has something like collections that would be a appropriate 
abstraction ~ experiment. But in couchdb I would have to add all these 
probe documents in 1 big database without collections. So if I only make 
changes to probes within an experiment this would influence the views of 
all the other billions document in the db. Because of the large number 
of documents it would be good to know beforehand what the implications 
are of this performance wise?

Any suggestions are welcome.

Tom

Mime
View raw message