incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Database size variation
Date Sat, 18 Dec 2010 03:32:07 GMT
On Fri, Dec 17, 2010 at 9:19 PM, Chris Johnson <csj790@yahoo.com> wrote:
> I have a script that logs into a firewall, exports the session table, parses
> it and writes a subset of the data to a database. Each session is a doc in
> the database. Currently, because of the way the database is exported, it is
> serialized and each doc is written to the database one by one. The database
> that is generated is extremely large. For example, the last database had
> 1.5M documents. As part of this process, the most recent database is
> replicated to another database witha known name.
>
> One thing I just noticed is the replicated database is significantly smaller
> in size. As an example, the database that I referred to above is 11+ Gig in
> size, but the replicated database is only 4 Gigs. Everything between the two
> databases appears to be consistent and the number records/update sequences
> are identical, so why such a variation in size?
>
> Chris
>

Try compacting the database you're inserting into. Because of the tail
append semantics, each isolated write will incur some overhead. The
more you batch writes, the less overhead. Replication batches writes
so by default it'll reduce overhead from a single insert. If you're
doing a batch import, you'll want to try and use _bulk_docs as much as
possible to reduce this overhead.

HTH,
Paul Davis

Mime
View raw message