incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: view file growing too large
Date Sat, 11 Feb 2012 14:39:01 GMT
Your dataset grows without bounds, therefore so does your database and
your indexes into that dataset.

A common pattern around this is the "temporal database" idea that
Simon alludes to. It follows from the observation that even if you did
delete old datapoints your database would not shrink (the tombstones
for a deleted document still take up space forever). At some interval,
hourly, daily, weekly, monthly, you write stats to a new database. You
can then archive or delete old databases.

A better plan would be to use a round-robin database, which is the
common storage approach taken by the munin/statsd/graphite/zenoss/etc
tools.

Simply stated: If you want your dataset to be smaller, delete some of it.

B.

On 11 February 2012 14:17, Simon Metson <simonmetson@googlemail.com> wrote:
> One thing to avoid is emitting from the view more data than you actually need, if you
emit doc as a value you probably want to change that, as it means you've duplicated all the
doc data. Also, there are a lot of cases where one view can provide input to N different pieces
of you application with appropriate view slicing and complex keys. With out more info it's
hard to be more specific.
>
> Lastly, if your just logging data I'd consider rotating through databases, one per day
say. Once the day is over hit the view to get the full summary info for the day, maybe store
that out in another database. That means your active view is of fixed maximum size and you
can archive out stuff you don't necessarily need (for example you could keep the summary views
live but take the raw data offline).
> Hope that helps
> Simon
>
> On 10 Feb 2012, at 15:48, C J <guerillanerd@gmail.com> wrote:
>
>> The view file for my database is growing ten times faster than my database.
>> View compaction recovers much of this used space, but I'd like to minimize
>> how often I run view compaction.
>>
>> Here's some background: I'm attempting to use couchdb as the backend to a
>> metrics and statistics system for our application. It is VERY similar to
>> statsd, if you're familiar with that. What this means is that we send a new
>> document to couch every 10 seconds. We never update existing documents and
>> never delete documents. The documents can contain anywhere from 20 to 500
>> datapoints. Each datapoint is emitted seperately in the format:
>> key: ["datapoint.name",2012,2,9,13,0,0] value <some small number>.
>>
>> Because we are writing so much data so frequently, I've found that I need
>> to keep our view warm by querying it on an interval (currently every ten
>> seconds). Functionally, this solution works great, when we hit the views
>> for real, they respond quite quickly. The problem is that this view warming
>> causes the view file to grow very quickly.
>>
>> Anyone know a way around this? FWIW, my view does have a reduce function
>> and for the view warming query, I've tried version with reduce=true and
>> reduce=false.
>>
>> Thanks in advance,
>> GN

Mime
View raw message