couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <>
Subject Re: silent view index file corruption
Date Wed, 07 Apr 2010 02:06:06 GMT
I immediately want to say 'ini file option' but I'm not sure whether to err
on safety or speed.

Maybe this is a good candidate for merkle trees or something else we can do
throughout the view tree that might less overhead than md5 summing all the
nodes? After all, most inner nodes shouldn't change most of the time. Some
incremental, cheap checksum might be a worthwhile *option*.

On Apr 6, 2010 6:04 PM, "Adam Kocoloski" <> wrote:

Hi all, we recently had an EC2 node go AWOL for about 12 hours.  When it
came back, we noticed after a few days that a number of the view indexes
stored on that node were not updating.  I did some digging into the error
logs and with Paul's help pieced together what was going on.  I won't bother
you with all the gory details unless you ask for them, but the gist of it is
that those files are corrupted.

The troubling thing for me is that we only discovered the corruption when it
completely broke the index updates.  In one case, it did this by rearranging
the bits so that couch_file thought that the btree node it was reading from
disk had an associated MD5 checksum. It didn't (no btree nodes do), and so
couch_file threw a file_corruption exception.  But if the corruption had
shown up in another part of the file I might never have known.  In fact,
some of the other indices on that node probably are silently corrupted.

You might wonder how likely it is that a file becomes corrupted but still
appears to be functioning.  I checked the last modified timestamps for three
broken files.  One was last modified when the node went down, but the other
two had timestamps in between the node's recovery and now.  To me, that
means that the view indexer was able to update those files for quite a while
(~2 days) before it bumped into a part of the btree that was corrupted.

I wonder what we should do about this.  My first thought is to make it
optional to write  btree nodes (possibly only for view index files?) using
append_term_md5 instead of append_term.  It seems like a simple patch, but I
don't know a priori what the performance hit would be.  Other thoughts?

Best, Adam

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message