couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Loshkarev (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (COUCHDB-2329) Log broken file name on compress/decompress error
Date Mon, 15 Sep 2014 12:29:33 GMT

     [ https://issues.apache.org/jira/browse/COUCHDB-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alexey Loshkarev updated COUCHDB-2329:
--------------------------------------
    Description: 
Hello.

I'm using couchdb for a bit large database set - over 50 databases with more than 500 million
documents in it with total disk size about 2 TB. I'm using cluster with 4 nodes for it.

As it is real life, there are hardware errors from time to time. Most of all didn't affect
couchdb, but some of them are. So couchdb write wrong data to disk, or read garbage from them
due to disk read errors.

The bad thing is that couchdb dies at the moment it can't decompress data.

The worst thing is that couchdb didn't log broken file name, to help me with this problem.
If couchdb would display me broken file name, i'll kill it and recreate via replication from
healthy node.

The ugly thing is, I must to drop whole node and re-replicate it. But in my situation, 2 TB
replicates over a month! So, average state of my cluster is - 3 nodes are up, and fourth -
replicating terabytes of data.

So, my proposal is to add file name, when couchdb fail to decompress data. 

Sample message:
[Mon, 15 Sep 2014 11:51:17 GMT] [error] [emulator] Error in process <0.24789.1> with
exit value: {function_clause,[{couch_compress,decompress,[<<1952804468 bytes>>],[{file,"couch_compress.erl"},{line,67}]},{couch_file,pread_term,2,[{file,"couch_file.erl"},{line,135}]},{couch_btree,get_node,2,[{file,"couch_btree.erl"},{line,349}]},{couch_btree,modify_node...




  was:
Hello.

I'm using couchdb for a bit large database set - over 50 databases with more than 500 million
documents in it with total disk size about 2 TB. I'm using cluster with 4 nodes for it.

As it is real life, there are hardware errors from time to time. Most of all didn't affect
couchdb, but some of them are. So couchdb write wrong data to disk, or read garbage from them
due to disk read errors.

The bad thing is that couchdb dies at the moment it can't decompress data.

The worst thins is that couchdb didn't log broken file name, to help me with this problem.
If couchdb would display me broken file name, i'll kill it and recreate via replication from
healthy node.

The ugly thing is, I must to drop whole node and re-replicate it. But in my situation, 2 TB
replicates over a month! So, average state of my cluster is - 3 nodes are up, and fourth -
replicating terabytes of data.

So, my proposal is to add file name, when couchdb fail to decompress data. 

Sample message:
[Mon, 15 Sep 2014 11:51:17 GMT] [error] [emulator] Error in process <0.24789.1> with
exit value: {function_clause,[{couch_compress,decompress,[<<1952804468 bytes>>],[{file,"couch_compress.erl"},{line,67}]},{couch_file,pread_term,2,[{file,"couch_file.erl"},{line,135}]},{couch_btree,get_node,2,[{file,"couch_btree.erl"},{line,349}]},{couch_btree,modify_node...





> Log broken file name on compress/decompress error
> -------------------------------------------------
>
>                 Key: COUCHDB-2329
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2329
>             Project: CouchDB
>          Issue Type: Improvement
>      Security Level: public(Regular issues) 
>          Components: Database Core, Logging
>            Reporter: Alexey Loshkarev
>
> Hello.
> I'm using couchdb for a bit large database set - over 50 databases with more than 500
million documents in it with total disk size about 2 TB. I'm using cluster with 4 nodes for
it.
> As it is real life, there are hardware errors from time to time. Most of all didn't affect
couchdb, but some of them are. So couchdb write wrong data to disk, or read garbage from them
due to disk read errors.
> The bad thing is that couchdb dies at the moment it can't decompress data.
> The worst thing is that couchdb didn't log broken file name, to help me with this problem.
If couchdb would display me broken file name, i'll kill it and recreate via replication from
healthy node.
> The ugly thing is, I must to drop whole node and re-replicate it. But in my situation,
2 TB replicates over a month! So, average state of my cluster is - 3 nodes are up, and fourth
- replicating terabytes of data.
> So, my proposal is to add file name, when couchdb fail to decompress data. 
> Sample message:
> [Mon, 15 Sep 2014 11:51:17 GMT] [error] [emulator] Error in process <0.24789.1>
with exit value: {function_clause,[{couch_compress,decompress,[<<1952804468 bytes>>],[{file,"couch_compress.erl"},{line,67}]},{couch_file,pread_term,2,[{file,"couch_file.erl"},{line,135}]},{couch_btree,get_node,2,[{file,"couch_btree.erl"},{line,349}]},{couch_btree,modify_node...




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message