couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Samuel Newson <rnew...@apache.org>
Subject Re: Indexes are inaccessible while indexers are running
Date Tue, 01 Nov 2016 22:17:15 GMT
Hi,

Indexing is incremental and couchdb catches up to the database when queried. You can call
the view periodically from cron or similar so the view is pre-built (or at least fresher).

Dumping in 40 million documents first and only then querying the views is the worst-case view
building experience. Adam is saying you could have queried the view while inserting documents
and thus the view would be built by the time you had finished the import.

To be clear, once the view over 40 million docs is built, adding one more doc causes only
one more doc's worth of indexing before the view is again queryable, which will happen so
quickly that you won't even notice.

If you want to query the view in its current state, rather than the default behaviour of blocking
until it is up to date with the input database you can use ?stale=ok parameter.

B.

> On 1 Nov 2016, at 21:39, Graham Bull <calzakk+couchdb@gmail.com> wrote:
> 
> Hi Adam,
> 
> Thanks for your reply. I'll have a look at the option when I get a chance,
> looks like it could be useful.
> 
> I'm a little concerned though. You say the indexes don't update
> automatically. What happens when values in indexed fields change?
> Presumably the indexes aren't therefore updated, in which case they must be
> periodically manually recreated?
> 
> I'm running the latest version, 2.0.0, on a decent desktop machine (i7, 8
> cores, 16GB RAM, SSDs). Going forward, our application would run on higher
> spec servers.
> 
> FYI, the indexers actually took close to 4 hours to do all the indexing,
> not the 2.5 I thought it would take. One thing I didn't mention was that
> I'd created 4 separate indexes, and it's possible we'd need more. It's
> looking likely that CouchDB isn't a good fit for what we're doing, which is
> a shame because it has lots of positives.
> 
> Graham
> 
> 
> 
> On 1 November 2016 at 16:08, Adam Kocoloski <kocolosk@apache.org> wrote:
> 
>> Hi Graham, the indexes don’t update automatically, but it is possible to
>> prime the indexers by issuing a query to the view at any point during the
>> import. One interesting option for you is the "?stale=update_after" flag,
>> which will respond with the current state of the view index and trigger a
>> background update of the indexes after the fact:
>> 
>> GET /<db>/_design/<ddoc>/_view/<view>?stale=update_after
>> 
>> You could also add a &limit=0 if you’re only interested in priming the
>> indexers.
>> 
>> As far as indexing performance is concerned … ~4500 docs/second isn’t
>> awesome, but the devil is in the details: how many times is each document
>> indexed? Does the server have adequate CPU and IO? Are you running 2.0 or
>> one of the 1.x versions?
>> 
>> I can dig up some benchmarks but I’m certain I’ve seen (Linux) systems
>> index several times faster than that. I haven’t seen a lot of extensive
>> performance testing on Windows though. Cheers,
>> 
>> Adam
>> 
>>> On Oct 31, 2016, at 7:41 AM, Graham Bull <calzakk+couchdb@gmail.com>
>> wrote:
>>> 
>>> Hello,
>>> 
>>> I'm currently evaluating CouchDB (and other NoSQL databases).
>>> 
>>> I have a number of databases of various sizes. After restarting the
>> CouchDB
>>> service (I'm on Windows) eight "indexer" tasks started running on the
>>> largest database (40 million documents), which was recently imported.
>>> 
>>> After 30 minutes the progress on all tasks is 20%. In the meantime I
>> can't
>>> run any queries using the database's indexes. At this rate, it'll take
>>> around 2.5 hours to index the entire database.
>>> 
>>> Presumably, when indexes are created, they're initially empty? And the
>>> indexer tasks are required to do the actual indexing? If so, then the
>>> performance is pretty bad. It took nearly 2 hours to import the 40
>> million
>>> records. Add on index creation, and you're looking at 4.5 hours. Without
>>> mentioning other relational and NoSQL databases by name, or giving any
>>> stats, CouchDB's import and indexing performance is pretty bad in
>>> comparison.
>>> 
>>> Is there a way to force the indexers to run immediately after importing
>> the
>>> data, and to query the indexing status so that my app can wait until it's
>>> completed?
>>> 
>>> Thanks in advance,
>>> 
>>> Graham
>> 
>> 


Mime
View raw message