couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Tarsa <gta...@axialproject.com>
Subject Re: CouchDB crash during compaction with no log messages
Date Sat, 12 Mar 2016 23:08:26 GMT
Current Summary: It was not clear how to proceed with determining how much memory was needed
for our application and the abrupt failures we are seeing were not giving us any data for
how to move forward.
For the record, we worked around the problem by using a Raid-1 volume that combines the instance
storage with an EBS volume.  This seems to mitigate the issue and gives us a persistent store
that will outlast the life of the AWS instance. This is not ideal, but it works for now. 
Long-term we will likely move off CouchDB and move to using jsonb with Postgresql.  A database
that crashes on memory errors without leaving a log trace is not a good production solution
for us.

Thanks,
Greg



> On Mar 8, 2016, at 3:33 PM, Greg Tarsa <gtarsa@axialproject.com> wrote:
> 
> All the compaction request are made at the same time.  So I assume they are running in
parallel.
> 
> Does the out of memory indicate a configuration problem?  Since only the interactive
session ends with the message and it is not in any log and the system did not kill the process
for memory reasons, I am thinking there is a couchdb malfunction involved here.  Also, it
works fine with an instance volume and initial results from experiments we are running here
with a Raid-1 volume that is a hybrid instance/EBS volume appear to be working.
> 
> If I need more memory, is there documentation or discussion somewhere that would guide
me as to how much I would need?
> 
> Thanks,
> Greg
> 
> 
>> On Mar 8, 2016, at 1:41 PM, Jan Lehnardt <jan@apache.org> wrote:
>> 
>>> 
>>> On 08 Mar 2016, at 18:07, Greg Tarsa <gtarsa@axialproject.com> wrote:
>>> 
>>> Hi Jan,
>>> 
>>> Thanks for your quick reply to my question.  I have some answers to your questions
and some new information that I got from running couched interactively.
>>> 
>>> 
>>>> Are there any other things going on on the VM, when you do this?
>>> The VM also hosts a MySQL server, but I see no evidence that this is a contributing
cause for the couch issue.
>>> 
>>>> 
>>>> Can you reliably reproduce this behavior?
>>> I can reliably reproduce it.
>>> 
>>>> 
>>>> Are there other correlating factors (like does this always happen at the
same time / due to a cronjob, etc)?
>>> It can be repeated by re-starting couched and re-requesting the compaction on
all databases.  It is not time-dependent.
>> 
>> Are you running compaction on the databases in parallel or sequentially?
>> 
>> 
>> 
>>> 
>>>> 
>>>> Can you set your CouchDB log level to debug and see if that gets you more
info? (curl -X PUT http://[user:pass@]127.0.0.1:5984/_config/log/level -d '"debug"’).
>>> (see below)
>> 
>> The paste ends with an allocation error which points to you running out of memory.
>> 
>> Best
>> Jan
>> --
>> 
>> 
>>> 
>>>> 
>>>> Is it possible for you to share these database files (publicly or in private)?
>>> The databases contain health data and I am unable to share them.
>>> 
>>>> 
>>>> What are your disk usage levels before/during compaction?
>>> Plenty of disk in this case.  The compacted data is in the 2G range.  The problem
does not seem to be storage-size related.  We are able to compact during regular operation
when using a 40G instance volume.  Unable to compact when using a 120G EBS volume.
>>> 
>>>> 
>>>> Are you getting anything in the system log(s)?
>>> That is what is odd.  There is nothing in the system logs or the couchdb logs.
>>> 
>>> Bonus data:
>>> 
>>> I ran the debug experiment with couchdb running interactively.  The session text
is below, but note that I also got the following error message and an erlang core dump:
>>> 
>>>  Crash dump was written to: erl_crash.dump
>>>  eheap_alloc: Cannot allocate 156725600 bytes of memory (of type "old_heap").
>>>  Aborted (core dumped)
>>> 
>>> The dump is ~500MB.
>>> 
>>> Here is the session text:
>>> 
>>> 
>>> [gtarsa@prod-db01 ~]$  sudo /usr/local/bin/couchdb
>>> 
>>> Apache CouchDB 1.6.1 (LogLevel=info) is starting.
>>> Apache CouchDB has started. Time to relax.
>>> 
>>> [info] [<0.31.0>] Apache CouchDB has started on http://0.0.0.0:5984/
>>> [info] [<0.120.0>] 127.0.0.1 - - GET /_config/level 200
>>> [info] [<0.678.0>] 127.0.0.1 - - GET /_config/log/level 200
>>> [error] [<0.803.0>] attempted upload of invalid JSON (set log_level to
debug to log it)
>>> [info] [<0.803.0>] 127.0.0.1 - - PUT /_config/log/level 400
>>> 
>>> =SUPERVISOR REPORT==== 8-Mar-2016::11:09:22 ===
>>>   Supervisor: {local,couch_primary_services}
>>>   Context:    child_terminated
>>>   Reason:     normal
>>>   Offender:   [{pid,<0.92.0>},
>>>                {name,couch_log},
>>>                {mfargs,{couch_log,start_link,[]}},
>>>                {restart_type,permanent},
>>>                {shutdown,brutal_kill},
>>>                {child_type,worker}]
>>> 
>>> [debug] [<0.117.0>] 'PUT' /_config/log/level {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Length',"7"},
>>>        {'Content-Type',"application/x-www-form-urlencoded"},
>>>        {'Host',"127.0.0.1:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.117.0>] OAuth Params: []
>>> 
>>> =SUPERVISOR REPORT==== 8-Mar-2016::11:09:25 ===
>>>   Supervisor: {local,couch_primary_services}
>>>   Context:    child_terminated
>>>   Reason:     normal
>>>   Offender:   [{pid,<0.828.0>},
>>>                {name,couch_log},
>>>                {mfargs,{couch_log,start_link,[]}},
>>>                {restart_type,permanent},
>>>                {shutdown,brutal_kill},
>>>                {child_type,worker}]
>>> 
>>> [debug] [<0.826.0>] 'GET' /_all_dbs {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.826.0>] OAuth Params: []
>>> [info] [<0.826.0>] 127.0.0.1 - - GET /_all_dbs 200
>>> [debug] [<0.833.0>] 'POST' /biometrics/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.833.0>] OAuth Params: []
>>> [info] [<0.872.0>] Starting compaction for db "biometrics"
>>> [debug] [<0.877.0>] Compaction process spawned for db "biometrics"
>>> [info] [<0.833.0>] 127.0.0.1 - - POST /biometrics/_compact 202
>>> [debug] [<0.115.0>] 'POST' /biometrics/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.115.0>] OAuth Params: []
>>> [info] [<0.115.0>] 127.0.0.1 - - POST /biometrics/_view_cleanup 202
>>> [debug] [<0.114.0>] 'POST' /diabetes/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.114.0>] OAuth Params: []
>>> [info] [<0.890.0>] Starting compaction for db "diabetes"
>>> [debug] [<0.895.0>] Compaction process spawned for db "diabetes"
>>> [info] [<0.114.0>] 127.0.0.1 - - POST /diabetes/_compact 202
>>> [debug] [<0.113.0>] 'POST' /diabetes/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.113.0>] OAuth Params: []
>>> [info] [<0.113.0>] 127.0.0.1 - - POST /diabetes/_view_cleanup 202
>>> [debug] [<0.112.0>] 'POST' /fitness/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.112.0>] OAuth Params: []
>>> [info] [<0.909.0>] Starting compaction for db "fitness"
>>> [debug] [<0.914.0>] Compaction process spawned for db "fitness"
>>> [info] [<0.112.0>] 127.0.0.1 - - POST /fitness/_compact 202
>>> [debug] [<0.111.0>] 'POST' /fitness/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.111.0>] OAuth Params: []
>>> [info] [<0.111.0>] 127.0.0.1 - - POST /fitness/_view_cleanup 202
>>> [debug] [<0.110.0>] 'POST' /nutrition/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.110.0>] OAuth Params: []
>>> [info] [<0.927.0>] Starting compaction for db "nutrition"
>>> [debug] [<0.932.0>] Compaction process spawned for db "nutrition"
>>> [info] [<0.110.0>] 127.0.0.1 - - POST /nutrition/_compact 202
>>> [debug] [<0.109.0>] 'POST' /nutrition/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.109.0>] OAuth Params: []
>>> [info] [<0.109.0>] 127.0.0.1 - - POST /nutrition/_view_cleanup 202
>>> [debug] [<0.108.0>] 'POST' /routine/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.108.0>] OAuth Params: []
>>> [info] [<0.945.0>] Starting compaction for db "routine"
>>> [debug] [<0.950.0>] Compaction process spawned for db "routine"
>>> [info] [<0.108.0>] 127.0.0.1 - - POST /routine/_compact 202
>>> [debug] [<0.123.0>] 'POST' /routine/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.123.0>] OAuth Params: []
>>> [info] [<0.123.0>] 127.0.0.1 - - POST /routine/_view_cleanup 202
>>> [debug] [<0.122.0>] 'POST' /sleep/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.122.0>] OAuth Params: []
>>> [info] [<0.963.0>] Starting compaction for db "sleep"
>>> [debug] [<0.87.0>] New task status for <0.895.0>: [{changes_done,113},
>>>                                                 {database,<<"diabetes">>},
>>>                                                 {progress,100},
>>>                                                 {started_on,1457453394},
>>>                                                 {total_changes,113},
>>>                                                 {type,database_compaction},
>>>                                                 {updated_on,1457453395}]
>>> [debug] [<0.968.0>] Compaction process spawned for db "sleep"
>>> [info] [<0.122.0>] 127.0.0.1 - - POST /sleep/_compact 202
>>> [debug] [<0.385.0>] 'POST' /sleep/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.385.0>] OAuth Params: []
>>> [debug] [<0.890.0>] CouchDB swapping files /usr/local/var/lib/couchdb/diabetes.couch
and /usr/local/var/lib/couchdb/diabetes.couch.compact.
>>> [info] [<0.385.0>] 127.0.0.1 - - POST /sleep/_view_cleanup 202
>>> [info] [<0.890.0>] Compaction for db "diabetes" completed.
>>> [debug] [<0.635.0>] 'POST' /tobacco_cessation/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.635.0>] OAuth Params: []
>>> [info] [<0.987.0>] Starting compaction for db "tobacco_cessation"
>>> [debug] [<0.992.0>] Compaction process spawned for db "tobacco_cessation"
>>> [info] [<0.635.0>] 127.0.0.1 - - POST /tobacco_cessation/_compact 202
>>> [debug] [<0.87.0>] New task status for <0.992.0>: [{changes_done,1},
>>>                                                 {database,
>>>                                                  <<"tobacco_cessation">>},
>>>                                                 {progress,100},
>>>                                                 {started_on,1457453395},
>>>                                                 {total_changes,1},
>>>                                                 {type,database_compaction},
>>>                                                 {updated_on,1457453395}]
>>> [debug] [<0.640.0>] 'POST' /tobacco_cessation/_view_cleanup {1,1} from
"127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.640.0>] OAuth Params: []
>>> [info] [<0.640.0>] 127.0.0.1 - - POST /tobacco_cessation/_view_cleanup
202
>>> [debug] [<0.987.0>] CouchDB swapping files /usr/local/var/lib/couchdb/tobacco_cessation.couch
and /usr/local/var/lib/couchdb/tobacco_cessation.couch.compact.
>>> [info] [<0.987.0>] Compaction for db "tobacco_cessation" completed.
>>> [debug] [<0.815.0>] 'POST' /trackers/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.815.0>] OAuth Params: []
>>> [info] [<0.1011.0>] Starting compaction for db "trackers"
>>> [debug] [<0.1016.0>] Compaction process spawned for db "trackers"
>>> [info] [<0.815.0>] 127.0.0.1 - - POST /trackers/_compact 202
>>> [debug] [<0.866.0>] 'POST' /trackers/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.866.0>] OAuth Params: []
>>> [info] [<0.866.0>] 127.0.0.1 - - POST /trackers/_view_cleanup 202
>>> [debug] [<0.867.0>] 'POST' /users/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.867.0>] OAuth Params: []
>>> [info] [<0.1029.0>] Starting compaction for db "users"
>>> [debug] [<0.1034.0>] Compaction process spawned for db "users"
>>> [info] [<0.867.0>] 127.0.0.1 - - POST /users/_compact 202
>>> [debug] [<0.87.0>] New task status for <0.1034.0>: [{changes_done,0},
>>>                                                  {database,<<"users">>},
>>>                                                  {progress,0},
>>>                                                  {started_on,1457453395},
>>>                                                  {total_changes,0},
>>>                                                  {type,database_compaction},
>>>                                                  {updated_on,1457453395}]
>>> [debug] [<0.884.0>] 'POST' /users/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.884.0>] OAuth Params: []
>>> [info] [<0.884.0>] 127.0.0.1 - - POST /users/_view_cleanup 202
>>> [debug] [<0.1029.0>] CouchDB swapping files /usr/local/var/lib/couchdb/users.couch
and /usr/local/var/lib/couchdb/users.couch.compact.
>>> [info] [<0.1029.0>] Compaction for db "users" completed.
>>> [debug] [<0.885.0>] 'POST' /weight/_compact {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.885.0>] OAuth Params: []
>>> [info] [<0.1053.0>] Starting compaction for db "weight"
>>> [debug] [<0.1058.0>] Compaction process spawned for db "weight"
>>> [info] [<0.885.0>] 127.0.0.1 - - POST /weight/_compact 202
>>> [debug] [<0.902.0>] 'POST' /weight/_view_cleanup {1,1} from "127.0.0.1"
>>> Headers: [{'Accept',"*/*"},
>>>        {'Content-Type',"application/json"},
>>>        {'Host',"localhost:5984"},
>>>        {'User-Agent',"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18
Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"}]
>>> [debug] [<0.902.0>] OAuth Params: []
>>> [info] [<0.902.0>] 127.0.0.1 - - POST /weight/_view_cleanup 202
>>> 
>>> Crash dump was written to: erl_crash.dump
>>> eheap_alloc: Cannot allocate 156725600 bytes of memory (of type "old_heap").
>>> 
>>> [gtarsa@prod-db01 ~]$ 
>>> 
>>> 
>>> 
>>>> On Mar 7, 2016, at 4:09 PM, Jan Lehnardt <jan@apache.org> wrote:
>>>> 
>>>> Heya Greg,
>>>> 
>>>> this should definitely not happen at all, regardless of AWS storage type.
>>>> 
>>>> Are there any other things going on on the VM, when you do this?
>>>> 
>>>> Can you reliably reproduce this behaviour?
>>>> 
>>>> Are there other correlating factors (like does this always happen at the
same time / due to a cronjob, etc)?
>>>> 
>>>> Can you set your CouchDB log level to debug and see if that gets you more
info? (curl -X PUT http://[user:pass@]127.0.0.1:5984/_config/log/level -d '"debug"').
>>>> 
>>>> Is it possible for you to share these database files (publicly or in private)?
>>>> 
>>>> What are your disk usage levels before/during compaction?
>>>> 
>>>> Are you getting anything in the system log(s)?
>>>> 
>>>> Best
>>>> Jan
>>>> -- 
>>>> Professional Support for Apache CouchDB:
>>>> https://neighbourhood.ie/couchdb-support/
>>>> 
>>>> 
>>>>> On 07 Mar 2016, at 21:27, Greg Tarsa <gtarsa@axialproject.com>
wrote:
>>>>> 
>>>>> We have a set of couchdb databases that we use to collect user information
for various purposes.  I am inheriting this configuration from a predecessor and am relatively
new to couchdb.
>>>>> 
>>>>> Whenever we attempt to compact the databases, the server crashes without
any messages either in the couchdb log or the system logs.  This is running in an AWS instance
with an EBS volume.
>>>>> 
>>>>> Experiments have shown that if the instance is configured with instance
storage (ephemeral storage that disappears when the instance disappears) then this operation
works properly.   But we would like to use larger volumes and have persistence.
>>>>> 
>>>>> When the instance is configured with an external EBS volume, then we
see the server crash described above.
>>>>> 
>>>>> I have searched the web for “couchdb compaction crash no log” and
not found anything helpful.
>>>>> 
>>>>> It seems like compacting while running should not be failing at all,
much less silently, so I am looking for insights to the problem, or solutions if such exist.
>>>>> 
>>>>> Configuration and log info is below.
>>>>> 
>>>>> Any help would be appreciated.
>>>>> 
>>>>> Thanks,
>>>>> Greg
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------
>>>>> 
>>>>> CouchDB version: 1.6.1
>>>>> OS: RHEL 6.6
>>>>> 
>>>>> ---------------------------------------------------------
>>>>> 
>>>>> Here is a directory of the databases as the time of the crash:
>>>>> 
>>>>> cat bad.couch.dbinfo.txt 
>>>>> total 15400740
>>>>> 12 -rw-r--r--. 1 couchdb couchdb       8297 Jan 20 16:31 _users.couch
>>>>> 16 -rw-r--r--. 1 couchdb couchdb      12393 Jan 20 16:33 _replicator.couch
>>>>> 21060 -rw-r--r--. 1 couchdb couchdb   21557368 Mar  7 11:57 biometrics.couch
>>>>> 781136 -rw-r--r--. 1 couchdb couchdb  799875192 Mar  7 12:00 fitness.couch
>>>>> 954244 -rw-r--r--. 1 couchdb couchdb  977137784 Mar  7 12:05 nutrition.couch
>>>>> 8419624 -rw-r--r--. 1 couchdb couchdb 8621678721 Mar  7 12:06 routine.couch
>>>>> 390796 -rw-r--r--. 1 couchdb couchdb  400167032 Mar  7 12:06 sleep.couch
>>>>> 217932 -rw-r--r--. 1 couchdb couchdb  223154296 Mar  7 12:06 weight.couch
>>>>> 4614884 -rw-r--r--. 1 couchdb couchdb 4725629060 Mar  7 12:06 trackers.couch
>>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 fitness.couch.compact
>>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 nutrition.couch.compact
>>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 routine.couch.compact
>>>>> 64 -rw-r--r--. 1 couchdb couchdb      61551 Mar  7 12:41 diabetes.couch
>>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 sleep.couch.compact
>>>>> 12 -rw-r--r--. 1 couchdb couchdb       8300 Mar  7 12:41 tobacco_cessation.couch
>>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 users.couch
>>>>>  4 -rw-r--r--. 1 couchdb couchdb         79 Mar  7 12:41 weight.couch.compact
>>>>> 152 -rw-r--r--. 1 couchdb couchdb     151797 Mar  7 12:42 trackers.couch.compact
>>>>> 784 -rw-r--r--. 1 couchdb couchdb     801865 Mar  7 12:42 biometrics.couch.compact
>>>>> 
>>>>> ---------------------------------------------------------
>>>>> 
>>>>> Here is the contents of the log at the time of the crash:
>>>>> 
>>>>> [Mon, 07 Mar 2016 17:25:32 GMT] [info] [<0.31.0>] Apache CouchDB
has started on http://0.0.0.0:5984/
>>>>> [Mon, 07 Mar 2016 17:25:32 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:25:33 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:25:34 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:25:35 GMT] [info] [<0.123.0>] 10.1.1.12 -
- GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:25:37 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:25:37 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:25:38 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:25:38 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:25:40 GMT] [info] [<0.123.0>] 10.1.1.12 -
- GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:25:45 GMT] [info] [<0.123.0>] 10.1.1.12 -
- GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:25:50 GMT] [info] [<0.123.0>] 10.1.1.12 -
- GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:25:52 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:25:52 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.114.0>] 10.1.1.12 -
- GET /users/_changes?feed=continuous&style=all_docs&since=0&heartbeat=10000 200
>>>>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:25:54 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:25:55 GMT] [info] [<0.123.0>] 10.1.1.12 -
- GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:26:00 GMT] [info] [<0.123.0>] 10.1.1.12 -
- GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:26:00 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /_active_tasks 200
>>>>> [Mon, 07 Mar 2016 17:26:01 GMT] [info] [<0.108.0>] 127.0.0.1 -
- GET /favicon.ico 200
>>>>> [Mon, 07 Mar 2016 17:26:05 GMT] [info] [<0.123.0>] 10.1.1.12 -
- GET /users/ 200
>>>>> ... [numerous GET /users/ 200 messages removed for brevity] ...
>>>>> [Mon, 07 Mar 2016 17:41:51 GMT] [info] [<0.123.0>] 10.1.1.12 -
- GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.152.0>] 127.0.0.1 -
- GET /_all_dbs 200
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1157.0>] Starting compaction
for db "biometrics"
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.151.0>] 127.0.0.1 -
- POST /biometrics/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.150.0>] 127.0.0.1 -
- POST /biometrics/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1175.0>] Starting compaction
for db "diabetes"
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.198.0>] 127.0.0.1 -
- POST /diabetes/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.197.0>] 127.0.0.1 -
- POST /diabetes/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1193.0>] Starting compaction
for db "fitness"
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.118.0>] 127.0.0.1 -
- POST /fitness/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.119.0>] 127.0.0.1 -
- POST /fitness/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.1211.0>] Starting compaction
for db "nutrition"
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.120.0>] 127.0.0.1 -
- POST /nutrition/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:52 GMT] [info] [<0.121.0>] 127.0.0.1 -
- POST /nutrition/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1229.0>] Starting compaction
for db "routine"
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.122.0>] 127.0.0.1 -
- POST /routine/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.115.0>] 127.0.0.1 -
- POST /routine/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1175.0>] Compaction
for db "diabetes" completed.
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1254.0>] Starting compaction
for db "sleep"
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.116.0>] 127.0.0.1 -
- POST /sleep/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.117.0>] 127.0.0.1 -
- POST /sleep/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1272.0>] Starting compaction
for db "tobacco_cessation"
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.184.0>] 127.0.0.1 -
- POST /tobacco_cessation/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.183.0>] 127.0.0.1 -
- POST /tobacco_cessation/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1290.0>] Starting compaction
for db "trackers"
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.182.0>] 127.0.0.1 -
- POST /trackers/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1272.0>] Compaction
for db "tobacco_cessation" completed.
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1151.0>] 127.0.0.1 -
- POST /trackers/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.162.0>] Starting compaction
for db "users"
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1152.0>] 127.0.0.1 -
- POST /users/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1168.0>] 127.0.0.1 -
- POST /users/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.162.0>] Compaction for
db "users" completed.
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1329.0>] Starting compaction
for db "weight"
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1170.0>] 127.0.0.1 -
- POST /weight/_compact 202
>>>>> [Mon, 07 Mar 2016 17:41:53 GMT] [info] [<0.1187.0>] 127.0.0.1 -
- POST /weight/_view_cleanup 202
>>>>> [Mon, 07 Mar 2016 17:41:56 GMT] [info] [<0.123.0>] 10.1.1.12 -
- GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:42:01 GMT] [info] [<0.123.0>] 10.1.1.12 -
- GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:42:06 GMT] [info] [<0.123.0>] 10.1.1.12 -
- GET /users/ 200
>>>>> [Mon, 07 Mar 2016 17:42:11 GMT] [info] [<0.123.0>] 10.1.1.12 -
- GET /users/ 200
>>>>> 
>>>>> --------------------------------------------------
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> -- 
>> Professional Support for Apache CouchDB:
>> https://neighbourhood.ie/couchdb-support/
> 


Mime
View raw message