couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: ways to improve compaction
Date Tue, 22 Dec 2009 16:56:41 GMT
On Mon, Dec 21, 2009 at 2:20 PM, Damien Katz <damien@apache.org> wrote:
> I saw recently some issues people where having with compaction, and I thought I'd get
some thoughts down about ways to improve the compaction code/experience.
>
> 1. Multi-process pipeline processing. Similar to the enhancements to the view indexing,
there is opportunities for pipelining operations instead of the current read/write batch operations
it does. This can reduce memory usage and make compaction faster.
> 2. Multiple disks/mount points. CouchDB could easily have 2 or more database dirs, and
each time it compacts, it copies the new database file to another dir/disk/mountpoint. For
servers with multiple disks this will greatly smooth the copying as the disk heads won't need
to seek between reads and writes.
> 3. Better compaction algorithms. There are all sorts of clever things that could be done
to make the compaction faster. Right now it rebuilds the database in a similar manner as if
it would if it clients were bulk updating it. This was the simplest way to do it, but certainly
not the fastest. There are a lot of ways to make this much more efficient, they just take
more work.
> 4. Tracking wasted space. This can be used to determine threshold for compaction. We
don't  need to track with 100% accuracy how much disk space is being wasted, but it would
be a big improvement to at least know how much disk space the raw docs take, and maybe calculate
an estimate of the indexes necessary to support them in a freshly compacted database.
> 5. Better Low level file driver support. Because we are using the Erlang built-in file
system drivers, we don't have access to a lot of flags. If we had our own drivers, one option
we'd like to use is to not OS cache the reads and write during the compaction, it's unnecessary
for compaction and it could completely consume the cache with rarely accessed data, evicting
lots of recently used live data, greatly hurting performance of other databases.
>
> Anyway, just getting these thoughts out. More ideas and especially code welcome.
>
> -Damien

Another thing worth considering, is that if we get block alignment
right, then our copy-to-a-new-file compaction could end up working as
compact-in-place on content-addressable filesystems. Most of the
blocks won't change content, so the FS can just write new pointers to
existing blocks, and then garbage collect unneeded blocks later. If we
get the block alignment right...

Chris

-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Mime
View raw message