[ https://issues.apache.org/jira/browse/COUCHDB-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867197#action_12867197 ] Damien Katz commented on COUCHDB-738: ------------------------------------- I've been thinking about this issue, and I think storing the whole rev tree in the by_seq index is a bad idea. I'm also thinking of ways to make the compaction faster. Store the full_doc_info outside the by_id btree, but store the doc_info instead (with it's pointers to the main doc and conflicts) with a pointer to the full_doc_info as well. On reads, this avoids the overhead of loading up the ful_doc_info just to get the main doc. Updates and replications reads (that request the rev_info) will have to load up the full_doc_info with an extra read, but unlikely to be an extra disk io since it will be close the most recent doc revision. The by_seq index will also have the doc_info and a pointer to the full_doc_info. Then on compaction, scan the by_seq index, copying over the full_doc_info and the documents and attachments into a new file. Each full_doc_info should be linked to the next with an offset to next's file pos. Then scan the newly written full_doc_info, converting them to doc_infos and pointers to the full_doc_info, and writing them out consecutively to the end of the file. Then sort just this portion of the file, on disk, by the id in the doc_info. This is the most expensive part of the compaction, but sorting things on disk is common problem with lots of open libraries out there that are highly optimized. Then convert this id sorted portion of the file to btree leaf nodes. Then rescan leaf nodes and build up the inner nodes, recurse until you have single root node left. This is now your by_id index. Then rescan the full_doc_infos, and write out to the end of the file the doc_infos and pointers back to the full_doc_infos. This is already sorted by_seq. Then convert this seq sorted portion of the file to btree leaf nodes. Then rescan leaf nodes and build up the inner nodes, recurse until you have single root node left. This is now your by_seq index. You now a fully compacted database with no wasted btree nodes. I think this will be a lot faster. With the exception of the by_id sorting phase, this eliminates random disk seeks. > more efficient DB compaction (fewer seeks) > ------------------------------------------ > > Key: COUCHDB-738 > URL: https://issues.apache.org/jira/browse/COUCHDB-738 > Project: CouchDB > Issue Type: Improvement > Components: Database Core > Affects Versions: 0.9.2, 0.10.1, 0.11 > Reporter: Adam Kocoloski > Assignee: Adam Kocoloski > Fix For: 1.1 > > Attachments: 738-efficient-compaction-v1.patch, 738-efficient-compaction-v2.patch > > > CouchDB's database compaction algorithm walks the by_seq btree, then does a lookup in the by_id btree for every document in the database. It does this because the #full_doc_info{} record with the full revision tree is only stored in the by_id tree. I'm proposing instead to store duplicate copies of #full_doc_info{} in both trees, and to have the compactor use the by_seq tree exclusively. The net effect is significantly fewer calls to pread(), and an compaction IO pattern where reads tend to be clustered close to each other in the file. > If the by_id tree is fully cached, or if the id tree nodes are located near the seq tree nodes, the performance improvement is small but noticeable (~10% in some simple tests). On the other hand, in the worst-case scenario of randomly-generated docids and a database much larger than main memory the improvement is huge. Joe Williams did some simple benchmarks with a 50k document, 600 MB database on a 256MB VPS. The compaction time for that DB dropped from 15m to 2m20s, so more than 6x faster. > Storing the #full_doc_info{} in the seq tree also allows for some similar optimizations in the replicator. > This patch might have downsides when documents have a large number of edits. These include an increase in the size of the database and slower view indexing. I expect both to be small effects. > The patch can be applied directly to trunk@934272. Existing DBs are still readable, new updates will be written in the new format, and databases can be fully upgraded by compacting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.