couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Troutwine" <goofyheadedp...@gmail.com>
Subject Proper Database Design and View Collation
Date Thu, 04 Sep 2008 00:50:50 GMT
I'm currently using CouchDB to store time-series data, but am having
difficulty conceptualizing a proper database design. In this email I
will outline the system I would like to develop, summarize my current
approach and give what I see to be its current defects. I would
appreciate any comments and suggestions toward improving my
implementation.

As I said, I'm gathering data from a number of meteorological sensors.
These devices take readings of various factors (ambient temperature,
atmospheric pressure and relative humidity) on a fixed interval, say
one per minute, and stores them until I am able to retrieve them.
Currently an attempt is made once per hour, though the connection to
an individual device is tenuous as best, so information concerning the
last attempted retrieval and the last successful retrieval must be
stored. Additionally, each sensor has a number of static attributes
which I also store in CouchDB, such as the sensor's unique ID and GPS
coordinates.

I represent each sensor as a single document in CouchDB, storing the
readings as documents, with timestamps, in a list. Here's an example:

 {"sensor_id" : SENSOR01123,
  "coordinate" : [46.209722, -122.192778],
  "last_attempt" : 1220480444,
  "last_update" : 1217887865,
  "readings" : [{"time_gathered" : 1217023706,
                 "temp" : 18,
                 "pressure" : 102.311,
                 "humidity" : 99},
                ...,
               ],
  }

I have four views: get_new_attempts, find_unresponsive,
find_malfunctioning and last_reading. The first two are simple, they
compare the last_attempt and last_update fields, respectively, to the
current date, emitting sensor_id and coordinate. The third requires
computing the standard deviation of the temperature, pressure and
humidity measurements of all readings and emits the sensor_id of that
sensor which has more than a fixed, acceptable deviation. As all the
readings are stored in the sensor document this computation is,
currently, a straight-forward iteration. The last creates emits the
sensor_id as key and the data reading with the largest time_gathered
as value.

The main problem with this approach is the eventual size of the sensor
document becomes quite large. I will exhaust my machine's ability to
fit more than a few documents in memory in less than a month. Also,
though I have read cmlenz's CouchDB "Joins", I do not see how I might
go about writing the find_malfunctioning and last_reading views if I
were to store readings as separate documents without modifying the
return value of the views (I am loath to do that).

Is it possible to store readings in separate documents and still
maintain the functionality outlined above? If so, how might I go about
doing that?

Thanks,
Brian


-- 
Brian

Mime
View raw message