couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <robert.new...@gmail.com>
Subject Re: scalability of couchdb
Date Mon, 31 May 2010 15:45:03 GMT
I've previously proposed extensions to the single-file approach to
data storage but I think it's worth remembering the advantages of the
current scheme, with a view to preserving as many of them as possible
in any future enhancement (to lose one, in my view, is a deal breaker
anyway).

Firstly, by always appending to a file, we avoid a large class of
corruption scenarios. Other systems that perform in-place
modifications have to guard against reordered writes, partial writes,
etc. They generally do this by creating an ancillary append-only file
called a transaction log or journal (which needs careful replaying
after a crash). CouchDB is already in journal shape.

Secondly, by always appending to a file, the completion of a single
fsync() call means all prior data is flushed to disk, allowing a
successful response code to really mean that your data is durable.

Thirdly, a crash at any point does not permanently waste space for any
incomplete operations. If you are halfway through adding a 50mib
attachment when you crash, the next compaction cleans up the 25mib of
unreferenced data for you. Alternative systems need to keep track of
additional state to do appropriate cleanup.

My proposed enhancement preserves all of these properties; namely,
that a couchdb database forms a series of files instead of a single
one, where a strict order applies between the files. This is
essentially the on-disk format of Berkeley JE. The main advantage is
that any one of these files can be 'compacted' independently. CouchDB
would track the 'fill rate' of each file (the number of entries that
are still current). When this value drops below a threshold, all
current entries are copied from that file to the new tail file, and
then the old file is simply deleted.

Needless to say I have not written the patch to achieve this as yet.
As for the claim that the current on-disk format is not 'scalable',
this appears a misuse of terminology as I understand it. In a
practical large system there would be many databases on many servers,
compacting any fraction of it at any one time is not such a burden.
That said, CouchDB's format is predicated on the notion that disks are
cheap (a notion borne out by reality, fortunately). To keep disk
consumption closer to actual consumption requires in-place editing,
and there be dragons.

B.


On Mon, May 31, 2010 at 4:07 PM, Filipe David Manana <fdmanana@gmail.com> wrote:
> On Mon, May 31, 2010 at 3:58 PM, Matteo Redaelli
> <matteo.redaelli@gmail.com>wrote:
>
>> Many thanks Till for your answer
>>
>> Yes I could move view files changing the configuration file or using
>> symbolic links...
>>
>> But the problem is the size of the DB: having all the db in a SINGLE file
>> is
>> quite a limited solution.. not very scalable...
>> it would be nice that the data of any couchdb database was distributed in
>> several folders (that could be several file systems in a unix/linux box)
>>
>
> Hi Matteo,
>
> Just submitted a patch yesterday that does that:
> https://issues.apache.org/jira/browse/COUCHDB-753
>
> Basically it allows you to have several directories where DBs reside (also
> allows compaction to occur in a different directory). I would like to do the
> same for view indexes, but I'm waiting for approval of that patch. If it
> doesn't get approved, one for view indexes is likely to not be approved as
> well.
>
>
>>
>> Regards
>> Matteo.
>>
>>
>>
>>
>> On Mon, May 24, 2010 at 6:53 PM, till <till@php.net> wrote:
>>
>> > On Mon, May 24, 2010 at 9:25 AM, Matteo Redaelli
>> > <matteo.redaelli@gmail.com> wrote:
>> > > Hello
>> > >
>> > > I currectly use couchdb as repository for my the Ebot project (
>> > > http://www.redaelli.org/matteo-blog/projects/ebot/). My db is getting
>> > more
>> > > and more bigger..
>> > >
>> > >   rw-r-r— 1 couchdb  81099821870 2010-05-23 ebot.couch
>> > >   rw-r-r— 1 couchdb  392589468 2010-05-23  .ebot_design
>> > >
>> > > For better scalability it would be better that
>> > >
>> > > 1) couchdb wrote data not only to one single file but to different
>> files
>> > to
>> > > DIFFERENT directories: in this way it would be easier to add new
>> devices,
>> > > file systems ...
>> > > 2) couchdb distributed (non only replicated) data among several
>> > instances.
>> > > like Riak and other NOSQL databases
>> > >
>> > > Do you think that these 2 features could be implemented in the next
>> near
>> > > future? or are they out of the scope of couchdb project?
>> > >
>> > > Thanks in advance
>> >
>> > Hi Matteo,
>> >
>> > you should try CouchDB-Lounge for your setup. It's not yet as smooth
>> > as riak or cassandra do partitioning (and auto-balancing, etc.) but it
>> > should get you started as well.
>> >
>> > Further reading:
>> > http://tilgovi.github.com/couchdb-lounge/
>> > http://groups.google.com/group/couchdb-lounge
>> >
>> >
>> http://till.klampaeckel.de/blog/archives/84-A-toolchain-for-CouchDB-Lounge.html
>> >
>> > CouchDB-Lounge is nginx, a twistd daemon and several CouchDB servers
>> > which can all run on the same server, of course. It'll help you
>> > utilize all the resources until you outgrow your current server and
>> > then it's trivial to move a couple of your shards to another one and
>> > spread it out.
>> >
>> > CouchDB-Lounge currently doesn't do auto-balancing which is useful if
>> > you want to add another server to your cluster. Instead the proposal
>> > is to overshard (run more CouchDB instances than you need on the same
>> > server) or add another lounge to the cluster and partition "behind"
>> > it.
>> >
>> > If you're running into space issues right now it helps to move the
>> > views (.ebot_*) to another disk and configure view_index_dir in your
>> > local.ini.
>> >
>> > Hope that helps!
>> >
>> > Cheers,
>> > Till
>> >
>>
>>
>>
>> --
>> Matteo Redaelli
>> http://www.redaelli.org/matteo/
>>
>
>
>
> --
> Filipe David Manana,
> fdmanana@gmail.com
>
> "Reasonable men adapt themselves to the world.
> Unreasonable men adapt the world to themselves.
> That's why all progress depends on unreasonable men."
>

Mime
View raw message