couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igor Klimer (JIRA)" <>
Subject [jira] [Commented] (COUCHDB-2040) Compaction fails when copying attachment
Date Wed, 29 Jan 2014 08:00:18 GMT


Igor Klimer commented on COUCHDB-2040:

As per Robert's suggestion, I've tried replicating the database, and it handled the corrupted
document very well - the whole process succeeded, it seems. The original database has 130.2
GB, 1603291 documents, the replicated one 121.9 GB, 1603279 documents (12 less - is it normal
that the number of documents changed?). But there is an error in the logs clearly showing
the document that's been giving me this much trouble:
[Wed, 29 Jan 2014 00:39:13 GMT] [error] [<0.28617.7>] Replicator: couldn't write document
`332720882465`, revision `1-32e947c4533449463d59a9caa8042677`, to target database `ecrepo2`.
Error: `md5_mismatch`.
So, it seems that Benoit's hunch was right, it is an md5_mismatch.
I've checked the offending document - the attachment is a pdf (generated by us) and it opened
once all right, with a small glitch in the text. However, subsequent requests seem to fail:
[Wed, 29 Jan 2014 07:54:04 GMT] [error] [<0.29843.7>] Uncaught error in HTTP request:
[Wed, 29 Jan 2014 07:54:04 GMT] [error] [<0.29843.7>] httpd 500 error response:
Interestingly, couchdb seems to "hang", meaning it doesn't return any error to the client,
just in the logs.
Of course, this document does not exist in the replicated database.

> Compaction fails when copying attachment
> ----------------------------------------
>                 Key: COUCHDB-2040
>                 URL:
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Igor Klimer
> Orignal discussion from the user mailing list:
> Digest:
> During database compaction, the process fails at about 50% with the following error: (CouchDB 1.2.0, Windows Server 2008 R2 Enterprise).
> After server and CouchDB upgrade the error is still the same:
(CouchDB 1.5.0, Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64)).
> There was one prior attempt at compaction that failed because of insufficient disk space:
> After this initial failure, I've made sure that there's sufficient disk space for the
.compact file.
> The .compact file was always removed before trying compaction again.
> At the request of Robert Samuel Newson, I've also tried with an empty .compact file -
the results were the same:
> Our I/O subsystem consists of some RAID5 matrices - the admins claim that they've been
running error-free since inception ;) We have yet to run a parity check, since that'd require
taking the matrix offline and I'd rather not do that without exhausting other options.
> Config files from the 1.2.0/Windows server (since that's where the fault must have occured):
> default.ini:
> local.ini:
> Other than the default delayed_commits set to true, there are no options that could affect
fsync()ing and such.
> I've run:
> curl localhost:5984/ecrepo/_changes?include_docs=true
> curl localhost:5984/ecrepo/_all_docs?include_docs=true
> and both calls succeeded, which would suggest that a faulty (incorrect checksum/length)
is at fault somewhere.

This message was sent by Atlassian JIRA

View raw message