hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <ja...@attributor.com>
Subject Re: File Descriptors not cleaned up
Date Mon, 10 Nov 2008 15:09:00 GMT
We have just realized one reason for the '/no live node contains block/' 
error from /DFSClient/ is an indication that the /DFSClient/ was unable 
to open a connection due to insufficient available file descriptors.

FsShell is particularly bad about consuming descriptors and leaving the 
containing objects for the Garbage Collector to reclaim the descriptors.

We will submit a patch in a few days.

Raghu Angadi wrote:
> Arv Mistry wrote:
>> Raghu,
>> In the test program I see 3 fd's used when the fs.open() is called. Two
>> of these are pipe and 1 eventpoll.
>> These 3 are never cleaned up and stay around. I track this by running it
>> in the debug mode and put a break point and use
>> Lsof -p <pid> to see the fd's. I do a diff of the output before the open
>> and after the open.
> It important to know _exactly_ where "before" and "after" break points 
> are in your example to answer accurately. In your example, I don't see 
> why extra thread matters. May be if you give me a runnable or close to 
> runnable example, I will know.
> But that does *not* mean there is an fd leak.
> For e.g., extend your example  like this : After the first thread 
> exists, repeat the same thing again. Do you see 6 more extra fds? You 
> wouldn't, or you shouldn't rather.
> If you want to further explore.. now sleep for 15 seconds in the main 
> thread after the second thread exits. Then invoke TestThread.run() in 
> the main thread (instead of using a seperate thread). Check lsof after 
> run() returns. What do you see?
> If you do these experiments and still think there is a leak, please 
> file a Jira.. file a jira even if you don't do the experiments :).
> IMHO, I still don't see any suspicious behavior.. may be 'lsof' when 
> your app sees 'too many open files' exception will clear this up us.
> Hope this helps.
> Raghu.
>> What I don't understand is why this doesn't get cleaned up when done in
>> a separate thread but does when its done in a single thread.
>> This is a problem in the real system because I run out of fd's and am no
>> longer able to open any more files after a few weeks.
>> This forces me to do a system restart to flush things out.
>> Cheers Arv

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message