couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Marca <jma...@translab.its.uci.edu>
Subject Re: reducing db size
Date Mon, 14 May 2012 20:08:24 GMT
On Mon, May 14, 2012 at 03:42:01PM -0400, Tim Tisdall wrote:
> Yes, I did it with a PUT for each id.  When you call for compaction, is
> there a way to see the progress or a way to know if it's done?

the "status" tool in Futon will show you compaction progress

Also, two other things.  Insertions of data goes faster if you use the
bulk_docs interface.  To keep things under control, I like to insert
about 100 docs at a time, but it depends on your doc size, really.

Second, I have found in my own totally unscientific testing that large
documents compact better than many small documents.  

For example, I have detector data with one record per 30 seconds.  If
I combine data into daily docs and save, after compaction the
resulting database is much smaller than if I keep one document per
observation.

I ran these tests back around the 1.0.1 generation of CouchDB, but I
think the reason compaction doesn't work well for small document is
the same gzip doesn't work well for small documents...if there is
very little repeated information in a document, then gzip and other
compression utilities can't do much.  The larger the doc, the more the
text will have repeats, and the better the compression algorithms
perform.  

But all that compression savings will be wasted if you then have to
write a view that explodes each doc back into its smaller docs.

Oh, and don't forget to compact any views you use as well.

Hope that helps,
James

> 
> On Mon, May 14, 2012 at 3:20 PM, Paul Davis <paul.joseph.davis@gmail.com>wrote:
> 
> > How did you insert them? If you did a PUT per docid you'll still want
> > to compact afterwards.
> >
> > On Mon, May 14, 2012 at 2:13 PM, Tim Tisdall <tisdall@gmail.com> wrote:
> > > I've got several gigabytes of data that I'm trying to store in a couchdb
> > on
> > > a single machine.  I've placed a section of the data in an sqlite db and
> > > the file is about 5.9gb.  I'm currently placing the same data into
> > couchdb
> > > and while it hasn't finished yet, the file size is already 10gb and
> > > continuing to grow.  The sqlite database is essentially a table of ids
> > with
> > > a json block of text for each, so I figured the couchdb wouldn't be too
> > > much different in size.
> > >
> > > Does anyone have some recommendations on how to reduce the size of the
> > db?
> > >  Right now I've only inserted data and have not made any "updates" to
> > > documents, so there should be no revision copies to be cleared away.
> >


Mime
View raw message