couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <>
Subject [jira] [Commented] (COUCHDB-1076) _all_docs performance degrades as doc_del_count increases
Date Mon, 10 Mar 2014 03:45:43 GMT


ASF subversion and git services commented on COUCHDB-1076:

Commit 8d37ee36f23e5aea09f1fd17fd783b74a45d2e7e in couchdb's branch refs/heads/master from
[;h=8d37ee3 ]

Update pagination docs - COUCHDB-1076 is old now

As far as I'm aware, skip is equivalently fast to a startkey search
because whole subtrees are skipped when their document count does
not exceed the remaining skip.

> _all_docs performance degrades as doc_del_count increases
> ---------------------------------------------------------
>                 Key: COUCHDB-1076
>                 URL:
>             Project: CouchDB
>          Issue Type: Sub-task
>            Reporter: Nathan Vander Wilt
>            Assignee: Randall Leeds
>             Fix For: 1.1.1, 1.2
> The time required to query _all_docs?limit=1 can be proportional to the doc_del_count
of the database depending on where the deleted docs are in the sort order. As kocolosk explained
on IRC:
> The tree that serves as the source for _all_docs contains a small record for each document,
deleted or not ... if you have a large number of deleted docs and their IDs are interspersed
with the non-deleted ones i can imagine that it would cause additional seeks when streaming
the _all_docs response
> In my use case ( and in other cases (e.g. a rolling log
or any long-lived database), the deleted docs may all be at the beginning of the _all_docs
view, making query performance end up like using "?limit=N&skip=doc_del_count".
> To improve the performance in cases where large blocks of documents have been deleted,
kocolosk notes:
> [10:30am] kocolosk: the inner nodes in the btree currently report the doc_count and doc_del_count
> [10:31am] kocolosk: we might be able to rewrite the function that walks the btree so
that it checks if the doc_count underneath an inner node is zero
> [10:31am] kocolosk: and then it can skip that part of the tree entirely
> [10:31am] kocolosk: instead of descending all the way to the leaf nodes and skipping
deleted documents one by one
> [10:31am] n[ate]vw: yeah, that'd have the benefit of being a code-only change
> [10:31am] kocolosk: right
> [10:32am] kocolosk: i think it should work rather nicely, though it probably requires
some interesting spelunking into the depths of couch_btree
> [10:33am] kocolosk: davisp: whaddya think?  would what i'm proposing be possible? (checking
for doc_count=0 in the inner node reduction and then skipping ahead when serving _all_docs)
> ...
> [10:39am] davisp: n[ate]vw: Its definitely doable, its just a question of how to do it
so that I don't bleed from the eyes when trying to read through the implementation
> ...
> [10:40am] rnewson: davisp: right. we definitely want to preserve the high readability
of the current btree code. /s
> [10:40am] davisp: kocolosk: The only thing I can think of is passing a function to the
view iteration that gets called to evaluate whether it should decend to a child node based
on the key/reduction pair
> [10:41am] kocolosk: davisp: what you're proposing would also allow for (more) efficient
implementation of skip

This message was sent by Atlassian JIRA

View raw message