incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From km <>
Subject Re: couchdb for genome data
Date Thu, 04 Mar 2010 07:35:14 GMT

You could have an additional key in the document identifying it as probe -
eg "type" (key) with value  "probe" like this:

       "probe_id" : 1234567890,
       "experiment_id" : 1234567890,
       "raw_value" : 0.43524,
       "analysis": { "cbs" : 0.436, "CBS+GLAD" : 0.4356 }

so all your probe documents would contain a key called  "type" set to
"probe". you can identify only these documents with this key.
Now when u design a view to search probe documents alone, u could use a
simple filter statement like this:
 if(doc.type=='probe'){ do something ...}
this will only search/index probe type documents.

NOTE: "type" is not a user defined key just like any other key - u can use
anyother name for it !

U might have other types of documents for which the type keyword will differ
Here there is no need to explicitly define a collection as in Mongodb.
All JSON documents could be stored in a single database.


On Thu, Mar 4, 2010 at 7:21 AM, Tom Sante <> wrote:

> Hi
> The data is now stored in a mysql table with about a billion (1000 million)
> rows.
> These rows are the data of a genetic test (arrayCGH) and build up like
> this:
> Every experiment (a few thousand of them total) contains measurements of
> about 180000 genetic probes. This raw data will be analyzed and the values
> run through different algorithms, so every probe needs to store more than 1
> value after the analysis is done. The values of different analysis are now
> stored in columns in that table making it a pain if we have to add a
> analysis to the table not yet part of the existing columns. This is why a
> schema free document based DB is probably a better fit.
> The initial idea was to give each probe a separate document, and when the
> original value is transform to an other value store this in the same
> document.
> {
>        "probe_id" : 1234567890,
>        "experiment_id" : 1234567890,
>        "raw_value" : 0.43524,
>        "analysis": { "cbs" : 0.436, "CBS+GLAD" : 0.4356 }
> }
> Once added to the database almost all changes to the data will be contained
> within an experiment.
> MongoDB has something like collections that would be a appropriate
> abstraction ~ experiment. But in couchdb I would have to add all these probe
> documents in 1 big database without collections. So if I only make changes
> to probes within an experiment this would influence the views of all the
> other billions document in the db. Because of the large number of documents
> it would be good to know beforehand what the implications are of this
> performance wise?
> Any suggestions are welcome.
> Tom

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message