Brian, Well. I thought I had an answer. I've managed to thoroughly confuse myself though. The original idea is basically emit one doc per sensor reading and then use group=true on a reduced view and calculate the data you need when all the passed in keys are equal. But looking at logging output from the view I'm getting very confused by how this beast works. So anyway, I'll throw this out there. It might work, might not. I'm especially concerned about what happens durring a rereduce. Right now I haven't got enough brain power left to load a database with a shitload of data to try and force it to happen (or if that would even force it). Anyway, on to the brain dump: 1 Doc per sensor that contains everything except the readings attribute. 1 Doc per reading. And the two docs would get type attributes to distinguish them. Your first two views are fine as they don't require accessing the reads attribute. Tackling your fourth view, last_reading: The map function would look something like: function(doc) { if(doc.type != "reading") return ; emit(doc.sensor_id, [doc.time_gathered, doc.temp, doc.pressure, ...etc]) ; } And the reduce: function(keys,values,rereduce) { if(keys == null) return null ; // Not sure why i'm getting null for keys. This was the beginning of the doubt. for(var i = 1 ; i < keys.length ; i++) { if(keys[i][0] != keys[i-1][0]) { return null ; } } var max_time = 0 ; for(var i = 1 ; i < values.length ; i++) { if(values[i][0] > values[max_time][0]) { max_time = i ; } } return values[i] ; } And then the query is done like such: http://localhost:5984/sensors/_view/readings/last?group=true Using that pattern would be trivially updated for doing stddev calculations. I mentioned being worried about rereduce. I have no idea what gets passed in or what not so I haven't the slightest how that would work. For finding the last reading, i *think* it might work as is. The standard deviation stuff would be more tricky though. Basically you'd have to use a single pass stddev algorithm which is pretty simple, and then instead of returning just a stddev, you'd return a structure that had the appropriate state information (num_samples, mean, variance). Refer to [1] for a java implementation of single pass stddev calculation if you haven't seen it before. And if you detected a rereduce, you'd just combine the set of passed structures which should theoretically be possible, but there'd be some trickery involved. [1] http://www.slamb.org/svn/repos/trunk/projects/common/src/java/org/slamb/common/stats/Sample.java On Wed, Sep 3, 2008 at 8:50 PM, Brian Troutwine wrote: > I'm currently using CouchDB to store time-series data, but am having > difficulty conceptualizing a proper database design. In this email I > will outline the system I would like to develop, summarize my current > approach and give what I see to be its current defects. I would > appreciate any comments and suggestions toward improving my > implementation. > > As I said, I'm gathering data from a number of meteorological sensors. > These devices take readings of various factors (ambient temperature, > atmospheric pressure and relative humidity) on a fixed interval, say > one per minute, and stores them until I am able to retrieve them. > Currently an attempt is made once per hour, though the connection to > an individual device is tenuous as best, so information concerning the > last attempted retrieval and the last successful retrieval must be > stored. Additionally, each sensor has a number of static attributes > which I also store in CouchDB, such as the sensor's unique ID and GPS > coordinates. > > I represent each sensor as a single document in CouchDB, storing the > readings as documents, with timestamps, in a list. Here's an example: > > {"sensor_id" : SENSOR01123, > "coordinate" : [46.209722, -122.192778], > "last_attempt" : 1220480444, > "last_update" : 1217887865, > "readings" : [{"time_gathered" : 1217023706, > "temp" : 18, > "pressure" : 102.311, > "humidity" : 99}, > ..., > ], > } > > I have four views: get_new_attempts, find_unresponsive, > find_malfunctioning and last_reading. The first two are simple, they > compare the last_attempt and last_update fields, respectively, to the > current date, emitting sensor_id and coordinate. The third requires > computing the standard deviation of the temperature, pressure and > humidity measurements of all readings and emits the sensor_id of that > sensor which has more than a fixed, acceptable deviation. As all the > readings are stored in the sensor document this computation is, > currently, a straight-forward iteration. The last creates emits the > sensor_id as key and the data reading with the largest time_gathered > as value. > > The main problem with this approach is the eventual size of the sensor > document becomes quite large. I will exhaust my machine's ability to > fit more than a few documents in memory in less than a month. Also, > though I have read cmlenz's CouchDB "Joins", I do not see how I might > go about writing the find_malfunctioning and last_reading views if I > were to store readings as separate documents without modifying the > return value of the views (I am loath to do that). > > Is it possible to store readings in separate documents and still > maintain the functionality outlined above? If so, how might I go about > doing that? > > Thanks, > Brian > > > -- > Brian >