couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Kocoloski (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-994) Crash after compacting large views
Date Fri, 18 Mar 2011 17:06:30 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008516#comment-13008516
] 

Adam Kocoloski commented on COUCHDB-994:
----------------------------------------

I had a chance to delve into this and I think there's a very real bug here.  The problem is
two-fold:

1) The header for the compacted index is written much later than it could be.
2) We don't try to use the .compact index storage if the primary storage is missing.

We can fix the first problem if we simply write a header in the view compactor process before
we send the compact_done message to the view group.  The current system doesn't write the
header until the view group processes the 'delayed_commit' message that it sends to itself
when it switches the file over.  If the group process closes at any point in the interim we're
going to reset.

The second problem seems a bit tricky on the face of it, but I think it will work out OK.
 View group compaction, unlike database compaction, is not all that easy to resume.  We don't
have access to detailed sequence numbers for each piece of the tree; all we have is a single
current_seq for the entire group.  But that's alright.  I think all we need to do is implement
the fix for #1, and then change the view group process to check for the presence of a .compact
file if the primary storage is missing.  Then one of 3 things can happen

1) no .compact file, so we create a new file and index from scratch
2) .compact file is partially written, but if we seek backwards to find the header it will
still have current_seq = 0, so we'll index from scratch
3) .compact file is fully written and has a valid current_seq.  We rename it and have a successful
recovery.

The second case can potentially block the view group for a long period of time as it scans
backwards through the file.  The third case is obviously the one we have to be the most careful
about, particularly when it comes to database deletion.  It looks to me like couch_view:do_reset_indexes/2
takes care of .compact files as well as primary storage, so I don't think adding recovery
changed our behavior at all there.

> Crash after compacting large views
> ----------------------------------
>
>                 Key: COUCHDB-994
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-994
>             Project: CouchDB
>          Issue Type: Bug
>    Affects Versions: 1.0.2
>         Environment: Centos5 64bit vm with 2CPU and 4G RAM running Erlang R14B and configured
to use the 64bit js-devel libraries.
> URL: http://svn.apache.org/repos/asf/couchdb/branches/1.0.x
> Repository Root: http://svn.apache.org/repos/asf
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 1050680
>            Reporter: Bob Clary
>         Attachments: couch_errors.txt, couch_errors_2.txt
>
>
> The database has over 9 million records. Several of the views are relatively dense in
that they emit a key for most documents. The views are successfully created initially but
with relatively large sizes from 20 to 95G. When attempting to compact them, the server will
crash upon completion of the compaction.
> This does not occur with the released 1.0.1 version but does with the 1.0.x svn version.
I'll attach example logs. Unfortunately they are level error and may not have enough information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message