kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Cai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-3904) File descriptor leaking (Too many open files) for long running stream process
Date Tue, 28 Jun 2016 00:39:57 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352126#comment-15352126
] 

Henry Cai commented on KAFKA-3904:
----------------------------------

I took a look at FileChannel.open, looks like it will still create a file descriptor for that
channel, so the underlying problem of creating too many file descriptors are still there.

I am not hundred percent sure we can use this new FileChannel.open() since it relies on underlying
FileSystemProvider.newFileChannel() and some of the implementations throws NotSupportedOperationException.

I think I will still use the traditional RandomAccessFile.getChannel and post a PR for this.


> File descriptor leaking (Too many open files) for long running stream process
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-3904
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3904
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Henry Cai
>            Assignee: Henry Cai
>              Labels: architecture, newbie
>
> I noticed when my application was running long (> 1 day), I will get 'Too many open
files' error.
> I used 'lsof' to list all the file descriptors used by the process, it's over 32K, but
most of them belongs to the .lock file, e.g. this same lock file shows 2700 times.
> I looked at the code, I think the problem is in:
>     File lockFile = new File(stateDir, ProcessorStateManager.LOCK_FILE_NAME);
>     FileChannel channel = new RandomAccessFile(lockFile, "rw").getChannel();
> Each time new RandomAccessFile is called, a new fd will be created, we probably should
either close or reuse this RandomAccessFile object.
> lsof result:
> java    14799 hcai *740u   REG                9,0        0 2415928585 /mnt/stream/join/rocksdb/ads-demo-30/0_16/.lock
> java    14799 hcai *743u   REG                9,0        0 2415928585 /mnt/stream/join/rocksdb/ads-demo-30/0_16/.lock
> java    14799 hcai *746u   REG                9,0        0 2415928585 /mnt/stream/join/rocksdb/ads-demo-30/0_16/.lock
> java    14799 hcai *755u   REG                9,0        0 2415928585 /mnt/stream/join/rocksdb/ads-demo-30/0_16/.lock
> hcai@teststream02001:~$ lsof -p 14799 | grep lock | grep 0_16  | wc
>    2709   24381  319662



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message