incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Glenn Rempe <gl...@rempe.us>
Subject Re: insert performance
Date Wed, 14 Oct 2009 23:48:27 GMT
Millions op DB's?

Wouldn't you run into filesystem limitations due to the fact that
CouchDB writes all of its DB's/indexes into a single dir?

e.g For the ext3 filesystem "There is a limit of 31998 sub-directories
per one directory, stemming from its limit of 32000 links per inode."

http://en.wikipedia.org/wiki/Ext3

And my limited knowledge of filesystem internals says that the more
files you have in a single dir the longer it will take to seek on
those files.

"The ext2 inode specification allows for over 100 trillion files to
reside in a single directory, however because of the current
linked-list directoryimplementation, only about 10-15 thousand files
can realistically be stored in a single directory.  This is why
systems such as Squid (http://www.squid-cache.org ) use cache
directories with many subdirectories - searching through tens of
thousands of files in one directory is sloooooooow."

http://answers.google.com/answers/threadview/id/122241.html

Of course this will vary by filesystem in absolute terms, but I think
the concept is the same for all current file systems. No?

CouchDB might really be able to address this if it did something like
make subdirs under the couchdb data dir that were derived from
portions of a hash of the filename.  Using such a 2 or 3 level deep
dir structure would indeed allow for a huge number of DB's.

e.g. if the db name hashes to a123df4g34fd.couch

Make dirs/files like:

DATA_DIR/a1/12/3d/f4g34fd.couch  # DB
DATA_DIR/a1/12/3d/.some_view_index_hidden_dir  # view index

No?

On Tue, Oct 13, 2009 at 3:59 PM, Chris Anderson <jchris@apache.org> wrote:
> On Tue, Oct 13, 2009 at 6:29 AM, Brian Karlak <zenkat@metaweb.com> wrote:
>>
>> One caveat, however: we have one (somewhat funky) usecase which creates a
>> large number of small databases.  Could the existence of several thousand
>> small databases affect performance?
>>
>
> We definitely support the many-databases use case (eg: 1 per user, aka
> millions of databases). I think there is extra support for that in
> 0.9.1, and of course 0.10 has only improved from there.
>
> Chris
>
>
> --
> Chris Anderson
> http://jchrisa.net
> http://couch.io
>



-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt

Mime
View raw message