incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: re-index efficiency
Date Thu, 22 Oct 2009 15:12:33 GMT
Fabio,

There are about four things that will slow view generation down from
the _bulk_docs rate:

1. JSON conversion (twice) when passing data to the view process
2. Collation of keys on tree insertion
3. I/O (Disk and stdio)
4. Memory thresholds

Things like native views will give noticeable speed improvements
because it avoids JSON serialization and transfer over stdio. The
other (theoretically) tunable parameter is the memory threshold that
triggers flushes to disk. Its not currently configurable by the client
(requires a rebuild of couchdb) and as such I haven't seen anyone
attempt to tune it.

HTH,
Paul Davis

On Thu, Oct 22, 2009 at 6:55 AM, Fabio Forno <fabio.forno@gmail.com> wrote:
> Hi,
> not knowing the internals of couchdb I may ask stupid question, so
> just ignore it if it's really stupid ;)
>
> Using it I've noticed the re-index times take a time which comparable
> to the insertion off all the documents without using bulk inserts,
> while with bulk inserts the insert ionof documents is much faster.
> Instead in my idea, re-indexing should be as fast as fast bulk
> inserts, since when computing an index we don't need to do many
> fsyncs, but instead allow maximum caching before disk writes (with
> berkeley db for example, sustained write of data exceeding the memory
> cache are 100-1000x faster without syncs for each write). So, since I
> don't think that this relative slowness is due to fsyncs which is the
> main reason? (another hint which rules out fsyncs is that cpu is
> rather high and not in waiting state)
>
> --
> Fabio Forno,
> Bluendo srl http://www.bluendo.com
> jabber id: ff@jabber.bluendo.com
>

Mime
View raw message