couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Samuel Newson <rnew...@apache.org>
Subject Re: bigcouch/couchdb file descriptor leak during compaction
Date Fri, 31 Jan 2014 10:28:41 GMT
https://github.com/cloudant/bigcouch/tree/f0f5a107c0b895dd72187c10baedec24b85329a9



On 31 Jan 2014, at 10:13, Vladimir Ralev <vladimir.ralev@gmail.com> wrote:

> I tried to rule out a file system problem and I did these:
> 
> chmod -R 777 /opt/bigcouch
> chown -R root /opt/bigcouch
> 
> Then ran the bigcouch as root.
> 
> I still have a leak, but it's for other files:
> 
> beam.smp  28679        root *086u      REG              254,2      8282
> 32250112
> /opt/bigcouch/var/lib/.shards/80000000-9fffffff/db1/1a/3a/3e9b00e2e5e72df737bf30cd24ad.1376585909_design/dfa1fd4be3aecb20848cad2feb20e00a.view
> 
> beam.smp  28679        root *087u      REG              254,2      8285
> 32246907
> /opt/bigcouch/var/lib/.shards/80000000-9fffffff/db1/07/d7/e18bfed619a6078c2b19fef66b2c.1371491971_design/dfa1fd4be3aecb20848cad2feb20e00a.view
> 
> 
> The deleted files don't leak anymore. And there are no errors on the
> bigcouch log. As far as I can tell this happens on all compactions. Even if
> I pace them slowly. The machine runs out of memory eventually (because the
> system limits are really high).
> 
> Can somebody point me to the source code of the couchdb used in bigcouch
> 0.4.2, I will add some extra logs here and there see if I can figure it out?
> 
> 
> 
> 
> 
> On Wed, Jan 29, 2014 at 2:22 AM, Robert Samuel Newson <rnewson@apache.org>wrote:
> 
>> 
>> Yes, anything moved to the .delete directory is fair game for deletion.
>> 
>> CouchDB and BigCouch move the file there and then delete it. This is so,
>> in the event of a crash, only one directory needs to be cleaned up rather
>> than potentially expensive recursive sweep.
>> 
>> As for why BigCouch fails to release your files, I don't know. Is this
>> happening for *all* compactions or is it quite rare but has accumulated
>> over a long period of time?
>> 
>> The difference between 0.4.0 and 0.4.2 is small and there's nothing that
>> would induce this issue afaik.
>> 
>> B.
>> 
>> On 28 Jan 2014, at 23:02, Vladimir Ralev <vladimir.ralev@gmail.com> wrote:
>> 
>>> Erlang R15B03 (erts-5.9.3.1) [source] [64-bit] [smp:8:8]
>> [async-threads:0]
>>> [hipe] [kernel-poll:false]
>>> 
>>> Bigcouch is 0.4.2 latest.
>>> {"couchdb":"Welcome","version":"1.1.1","bigcouch":"0.4.2"}
>>> I haven't found anything unusual about the disk/OS/FS, all Debian
>> defaults.
>>> But I will keep looking. I will try to look into the source code, is it
>>> safe to assume all these deleted files are deleted by the compaction code
>>> and not some other part of the system?
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Jan 29, 2014 at 12:23 AM, Robert Samuel Newson
>>> <rnewson@apache.org>wrote:
>>> 
>>>> 
>>>> 
>>>> What version of erlang is this? There are some to avoid, R14B02 being
>> the
>>>> most notable.
>>>> 
>>>> curious that you have so many of these, anything odd about filesystem or
>>>> disk?
>>>> 
>>>> All that said, a restart is your only method of freeing these if
>> bigcouch
>>>> (0.4.0 I assume?) is not able to.
>>>> 
>>>> B
>>>> 
>>>> On 28 Jan 2014, at 21:49, Vladimir Ralev <vladimir.ralev@gmail.com>
>> wrote:
>>>> 
>>>>> Hi guys,
>>>>> 
>>>>> I am monitoring a huge compaction right now and keeping an eye of the
>>>>> system not to fail with emfile error.
>>>>> 
>>>>> The number of file descriptor owned by couch is growing very fast. I
>> can
>>>> do:
>>>>> 
>>>>> lsof -p CouchPID
>>>>> 
>>>>> and get tons of these:
>>>>> 
>>>>> beam.smp 21853 root *152u   REG  254,2        8306 30670917
>>>>> /opt/bigcouch/var/lib/.delete/b4b3ab2330a9672d7138fb562ebf90dd
>> (deleted)
>>>>> 
>>>>> beam.smp 21853 root *153u   REG  254,2        8282 30671071
>>>>> /opt/bigcouch/var/lib/.delete/a218b0088e72278990f848fd8b2de5d9
>> (deleted)
>>>>> 
>>>>> beam.smp 21853 root *154u   REG  254,2        8372 30670973
>>>>> /opt/bigcouch/var/lib/.delete/05a22639d021929b31c982954ef9e99b
>> (deleted)
>>>>> 
>>>>> beam.smp 21853 root *155u   REG  254,2        8297 30671201
>>>>> /opt/bigcouch/var/lib/.delete/6669ce0a6c235a977ea46ded37928338
>> (deleted)
>>>>> 
>>>>> beam.smp 21853 root *156u   REG  254,2        8294 30670974
>>>>> /opt/bigcouch/var/lib/.delete/bd2a65a16205529faf9118f0bd6d26b1
>> (deleted)
>>>>> 
>>>>> beam.smp 21853 root *157u   REG  254,2        8294 30671159
>>>>> /opt/bigcouch/var/lib/.delete/6b87bba6fd0b87c1bcaf47a2ba22aee4
>> (deleted)
>>>>> 
>>>>> beam.smp 21853 root *158u   REG  254,2        8294 30670975
>>>>> /opt/bigcouch/var/lib/.delete/392e9bd3825ff953dc808f83f6cba97e
>> (deleted)
>>>>> 
>>>>> 
>>>>> Currently they are at 55000 such descriptors and they are never
>> released.
>>>>> Note that these files are indeed deleted, but the system doesn't
>> release
>>>>> the handles. I can't find a reference to similar problems. Is this a
>>>> known
>>>>> issue i should watch out for?
>>>>> 
>>>>> 
>>>>> I suppose i can restart the system in between compactions to release
>> the
>>>>> files, but if you have any other advice its highly appreciated.
>>>> 
>>>> 
>> 
>> 


Mime
View raw message