couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Randall Leeds (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-1076) _all_docs performance degrades as doc_del_count increases
Date Tue, 08 Mar 2011 01:19:59 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003752#comment-13003752
] 

Randall Leeds commented on COUCHDB-1076:
----------------------------------------

I did the digging and I think I've got an elegant fix. A little more work to be done before
I post a patch but I wanted to let you all know so we avoid duplicate work. If you've thought
about, or are curious about, this or have partial work of your own, ping me on IRC (tilgovi).

> _all_docs performance degrades as doc_del_count increases
> ---------------------------------------------------------
>
>                 Key: COUCHDB-1076
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1076
>             Project: CouchDB
>          Issue Type: Bug
>            Reporter: Nathan Vander Wilt
>
> The time required to query _all_docs?limit=1 can be proportional to the doc_del_count
of the database depending on where the deleted docs are in the sort order. As kocolosk explained
on IRC:
> The tree that serves as the source for _all_docs contains a small record for each document,
deleted or not ... if you have a large number of deleted docs and their IDs are interspersed
with the non-deleted ones i can imagine that it would cause additional seeks when streaming
the _all_docs response
> In my use case (https://github.com/natevw/RQMS) and in other cases (e.g. a rolling log
or any long-lived database), the deleted docs may all be at the beginning of the _all_docs
view, making query performance end up like using "?limit=N&skip=doc_del_count".
> To improve the performance in cases where large blocks of documents have been deleted,
kocolosk notes:
> [10:30am] kocolosk: the inner nodes in the btree currently report the doc_count and doc_del_count
> [10:31am] kocolosk: we might be able to rewrite the function that walks the btree so
that it checks if the doc_count underneath an inner node is zero
> [10:31am] kocolosk: and then it can skip that part of the tree entirely
> [10:31am] kocolosk: instead of descending all the way to the leaf nodes and skipping
deleted documents one by one
> [10:31am] n[ate]vw: yeah, that'd have the benefit of being a code-only change
> [10:31am] kocolosk: right
> [10:32am] kocolosk: i think it should work rather nicely, though it probably requires
some interesting spelunking into the depths of couch_btree
> [10:33am] kocolosk: davisp: whaddya think?  would what i'm proposing be possible? (checking
for doc_count=0 in the inner node reduction and then skipping ahead when serving _all_docs)
> ...
> [10:39am] davisp: n[ate]vw: Its definitely doable, its just a question of how to do it
so that I don't bleed from the eyes when trying to read through the implementation
> ...
> [10:40am] rnewson: davisp: right. we definitely want to preserve the high readability
of the current btree code. /s
> [10:40am] davisp: kocolosk: The only thing I can think of is passing a function to the
view iteration that gets called to evaluate whether it should decend to a child node based
on the key/reduction pair
> [10:41am] kocolosk: davisp: what you're proposing would also allow for (more) efficient
implementation of skip

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message