couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Samuel Newson <rnew...@apache.org>
Subject Re: Compaction does not release file descriptors
Date Mon, 05 Sep 2016 12:29:32 GMT
Thanks Alex.

I can't see an obvious path where we open but fail to close in the compaction daemon module,
but we could move to using couch_util:with_db/2 for extra safety, so perhaps it's elsewhere
in the compaction code.

It would be useful to know what kind of processes are keeping the files open, we can see those
with process_info(Pid of couch_file, [monitored_by]).

B.

> On 5 Sep 2016, at 13:20, Alexander Shorin <kxepal@gmail.com> wrote:
> 
> Robert,
> 
> This is an old bug that compactor has no matter what Erlang version
> was used. I hit it during work on munin-couchdb:
> https://github.com/gws/munin-plugin-couchdb/#open-files
> 
> Graph shows what will happen if you'll compact a lot of databases
> continuously for some time. I recall how I with Davis and Adam borrow
> into the code looking for the reason where we don't close the file,
> but it wasn't easy to find the cause.
> 
> --
> ,,,^..^,,,
> 
> 
> On Mon, Sep 5, 2016 at 12:32 PM, Robert Samuel Newson
> <rnewson@apache.org> wrote:
>> Hi Nigel,
>> 
>> Thanks for the report. We've seen this issue with R14 series of erlang, where calling
file:close/1 doesn't always close the file descriptor, so I suggest trying something newer
(I can vouch for 17.5 from extensive production experience).
>> 
>> Can you confirm if this is _every_ compaction or only a subset? If the latter, can
you estimate what percentage?
>> 
>> You say "at 50%" so I'm inferring you've enabled the compaction daemon, but please
confirm.
>> 
>> B.
>> 
>> 
>>> On 5 Sep 2016, at 09:53, Nigel Phippen <Nigel.Phippen@s-a-m.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> This is my first post so please bear with me.
>>> 
>>> I am running CouchDB 1.6.1 (with Erlang is R14B-04.3.el6) on CentOS 6.7.
>>> 
>>> I have multiple databases on our single server, with each database having around
a dozen views. Thousands of new documents are added to the databases throughout the day but
there are no document deletions (unless done for administrative purposes). Many documents
are regularly updated, possibly hundreds of times, leading to documents having multiple versions.
Database and view compaction is set to occur at 50%.
>>> 
>>> The problem I am seeing is that, over the course of several days, disk space
is being consumed in the volume housing the CouchDB databases. Upon investigation, I can see
that CouchDB (or possibly some other process) appears to have moved files to a  '/usr/local/var/lib/couchdb/.delete'
folder, ready for deletion, but has not actually fully deleted the files.
>>> 
>>>    -------------------------------------------------------
>>>    # /usr/sbin/lsof +aL1
>>> 
>>>    COMMAND     PID      USER   FD   TYPE DEVICE SIZE/OFF NLINK   NODE NAME
>>>    beam.smp  21784   couchdb   19u   REG  253,1 12747263     0 157740 /usr/local/var/lib/couchdb/.delete/e3a4de3acbf62f6fe6621c0d584adcee
(deleted)
>>>    beam.smp  21784   couchdb   41u   REG  253,1 13292013     0 157757 /usr/local/var/lib/couchdb/.delete/7202d47094b51d60d9a4cc39f448f2c8
(deleted)
>>>    beam.smp  21784   couchdb   61u   REG  253,1 12317183     0 158688 /usr/local/var/lib/couchdb/.delete/518f417167c31921925fe66b11ca85d2
(deleted)
>>>    beam.smp  21784   couchdb   64u   REG  253,1  8471022     0 158669 /usr/local/var/lib/couchdb/.delete/bea3b216976a62912ee79034fc374314
(deleted)
>>>    beam.smp  21784   couchdb  162u   REG  253,1  9097692     0 139109 /usr/local/var/lib/couchdb/.delete/48f75f12d680afbd7ec1c0c3c01ccb99
(deleted)
>>>    beam.smp  21784   couchdb  168u   REG  253,1  8901102     0 155061 /usr/local/var/lib/couchdb/.delete/e5692819be8422a83f675daa1267cc3a
(deleted)
>>>    beam.smp  21784   couchdb  187u   REG  253,1 13046253     0 157756 /usr/local/var/lib/couchdb/.delete/8f2cb8517ab7659cc04091cc9db735e8
(deleted)
>>>    -------------------------------------------------------
>>> 
>>> Over several days, there can be dozens of these files, consuming GBytes of space.
Left unchecked, all disk space in the /usr volume will be consumed, causing the system to
fail. The only way to clear out the files for good is to restart the CouchDB service.
>>> 
>>> This appears to be the same problem as reported in https://issues.apache.org/jira/browse/COUCHDB-926
 over five years ago.
>>> 
>>> I'd appreciate any assistance is resolving this issue. Please let me know if
additional information is required.
>>> 
>>> Many thanks,
>>> 
>>> Nigel.
>>> ---------------------------------------------------------------------------------------
>>> This email has been scanned for email related threats and delivered safely by
Mimecast.
>>> For more information please visit http://www.mimecast.com
>>> ---------------------------------------------------------------------------------------
>>> 
>> 


Mime
View raw message