hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhruba Borthakur" <dhr...@yahoo-inc.com>
Subject RE: Calling FsShell.doMain() hold so many threads
Date Tue, 31 Jul 2007 07:07:45 GMT
Ok, I created http://issues.apache.org/jira/browse/HADOOP-1666 to track this
issue. I also attached a patch with that issue.

Thanks,
dhruba

-----Original Message-----
From: KrzyCube [mailto:yuxh312@gmail.com] 
Sent: Sunday, July 29, 2007 7:46 PM
To: hadoop-user@lucene.apache.org
Subject: RE: Calling FsShell.doMain() hold so many threads


although a little late ,  i test the second patch and it really works fine.

but , i still get problem about :

who is really hold the thread ?  
reason is  FsShell instance can not be disposed? or , DFSClient instance?


Dhruba Borthakur wrote:
> 
> 
> Ok, can you pl remove the earlier patch I gave you, instead use this
> modified patch? This shud work.
> 
> Thanks,
> Dhruba
> 
> 
> -----Original Message-----
> From: KrzyCube [mailto:yuxh312@gmail.com] 
> Sent: Wednesday, July 25, 2007 12:51 AM
> To: hadoop-user@lucene.apache.org
> Subject: RE: Calling FsShell.doMain() hold so many threads
> 
> 
> hi , dhruba
> 
> i have tried the patch [restartableFsShell.patch], but the problem is
> still
> there. 
> 
> i have view the code in debug mode , and the "fs = null" both in init()
> and
> in finally area has all
> be hit , and the threads still be create.
> 
> so i think it must be some other problems. 
> i will make description more detailed later , with my code and my
> exceptions.
> and the snapshot which i caught under windows xp 
> [only because i don't know how to view the threads num of a process under
> Ubuntu Linux].
> 
> 
> Dhruba Borthakur wrote:
>> 
>> Please try this attached patch, let me know if it works.
>> 
>> Thanks,
>> dhruba
>> 
>> -----Original Message-----
>> From: KrzyCube [mailto:yuxh312@gmail.com] 
>> Sent: Tuesday, July 24, 2007 6:19 PM
>> To: hadoop-user@lucene.apache.org
>> Subject: Re: Calling FsShell.doMain() hold so many threads
>> 
>> 
>> first of all ,thanks , Raghu.
>> 
>> here's the exception info:
>> ------------------------------------------------------------------------
>> Exception in thread "main" java.lang.OutOfMemoryError: unable to create
>> new
>> native thread
>> at java.lang.Thread.start0(Native Method)
>> at java.lang.Thread.start(Unknown Source)
>> at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:116)
>> at
>>
>
org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.initial
>> ize(DistributedFileSystem.java:67)
>> at
>>
org.apache.hadoop.fs.FilterFileSystem.initialize(FilterFileSystem.java:57)
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160)
>> at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
>> at org.apache.hadoop.fs.FsShell.init(FsShell.java:41)
>> at org.apache.hadoop.fs.FsShell.run(FsShell.java:809)
>> at kingsoft.lab.duba.CustomInterface.CreateDir(CustomInterface.java:138)
>> at kingsoft.lab.duba.CustomInterface.main(CustomInterface.java:155)
>> ------------------------------------------------------------------------
>> 
>> Then , is there any recommendable API for these use ?
>> here "these" I mean: upload or download files and create dir
>> programmatically even in concurrency operation.
>> 
>> 
>> Raghu Angadi wrote:
>>> 
>>> 
>>> Can you get the stack trace of the threads that are left? It was not 
>>> obvious from the code where a thread is started. It might be 'trash 
>>> handler'.
>>> 
>>> You could add sleep(10sec) to give you enough time to get the trace.
>>> 
>>> FsShell might not be designed for this use, but seems like a pretty 
>>> useful feature.
>>> 
>>> Raghu.
>>> 
>>> KrzyCube wrote:
>>>> I have tried the way TestDFSShell.java does,
>>>> here's my code:
>>>> 
>>>> ------------------------------------------------------------
>>>> public class CustomInterface 
>>>> {	
>>>> 	Configuration conf ;
>>>> 	FsShell fs ;
>>>> 	
>>>> 	public CustomInterface()
>>>> 	{
>>>> 		conf = new Configuration();
>>>> 		fs = new FsShell();
>>>> 		
>>>> 		fs.setConf(conf);
>>>> 	}
>>>> 
>>>>         public int createDir(String strDirName,String strPath)
>>>> 	{
>>>>                 // omit exception catch
>>>> 		int iRet = 0;
>>>> 	        strPath += strDirName;
>>>> 	        String[] strCmd = new String[2];
>>>> 		strCmd[0] = "-mkdir";
>>>> 		strCmd[1] = strPath;		
>>>> 		return m_fs.run(strCmd);		
>>>> 	}	
>>>> }
>>>> ------------------------------------------------------------
>>>> 
>>>> Then i just call the createdir Method
>>>> 
>>>> for(int i =0 ; i < 100000 ; i ++)
>>>> {
>>>>     custom.createDir("someName");
>>>> }
>>>> 
>>>> this cause the java vm process hold many threads
>>>> and these threads eat memory .
>>>> till the JVM Heap are eat up , throws Exceptions.
>>>> only larger Heap size holds more threads , but not fix the problem.
>>>> 
>>>> thanks.
>>>> 
>>>> 
>>>> Dhruba Borthakur wrote:
>>>>> One example of programmatically using FsShell is in
>>>>> src/test/org/apache/hadoop/dfs/TestDFSShell.java
>>>>>
>>>>> Thanks,
>>>>> dhruba
>>>>>
>>>>> -----Original Message-----
>>>>> From: KrzyCube [mailto:yuxh312@gmail.com] 
>>>>> Sent: Monday, July 23, 2007 7:49 PM
>>>>> To: hadoop-user@lucene.apache.org
>>>>> Subject: Calling FsShell.doMain() hold so many threads
>>>>>
>>>>>
>>>>> Hi there:
>>>>>
>>>>> i got two questions:
>>>>>
>>>>> Q1:
>>>>>     I am try to  call the FsShell.doMain() with my own code , which is
>>>>> only
>>>>> a easy wrapper of the FsShell.
>>>>> But when i am trying to create many dirs , 10000 etc. Exception like
>>>>> "Not
>>>>> enough memory for more threads" throw ,  i have set the -Xmx512m.
>>>>>     Then i trying to view the process info while the program running
,
>>>>> then
>>>>> i found there are more and more threads invoked during the process ,
>>>>> and
>>>>> eat
>>>>> more and more memory ,all threads still there without exit.
>>>>>     Then i came to the source code , and found that while the
>>>>> FsShell.Main()
>>>>> for terminal call there is one line
>>>>> "System.exit(return_value_of_doMain)"
>>>>> ,
>>>>> Is that mean the call of the ToolBase.run() which implemented in
>>>>> FsShell.java is always create a new thread and have to be force
>>>>> terminated
>>>>> by System.exit() to kill the process ?
>>>>>     So , if that is , how can i write my own code to use hadoop with
>>>>> FsShell
>>>>> in multi-thread mode , or is there any other way to do this ?
>>>>>
>>>>> Q2:
>>>>>      I svn code  , and run it in eclipse [the only reason i refer to
>>>>> eclipse
>>>>> is to indicate my environment],
>>>>> under Unbuntu 7.04.
>>>>>      all about casual , i want to see how much time the
>>>>> FsShell.doMain()
>>>>> take , I use "new Date()" and 
>>>>> get the interval with "DateEnd.getTime() - DateBeg.getTime()"
>>>>>      Then i found that: even mkdir take more then 1000 [getTime shows]
>>>>> if there's no arguments , it take 25 , but even if i just give it a
>>>>> wrong
>>>>> argument , such as "-sl", it take more than 1000 , is that means the
>>>>> argument check take most of the time cost?
>>>>>
>>>>> -- 
>>>>> View this message in context:
>>>>>
>>
>
http://www.nabble.com/Calling-FsShell.doMain%28%29-hold-so-many-threads-tf41
>>>>> 33557.html#a11756139
>>>>> Sent from the Hadoop Users mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> 
>>> 
>>> 
>>> 
>> 
>> -- 
>> View this message in context:
>>
>
http://www.nabble.com/Calling-FsShell.doMain%28%29-hold-so-many-threads-tf41
>> 33557.html#a11774684
>> Sent from the Hadoop Users mailing list archive at Nabble.com.
>> 
>> 
>> 
> 
> -- 
> View this message in context:
>
http://www.nabble.com/Calling-FsShell.doMain%28%29-hold-so-many-threads-tf41
> 33557.html#a11778036
> Sent from the Hadoop Users mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context:
http://www.nabble.com/Calling-FsShell.doMain%28%29-hold-so-many-threads-tf41
33557.html#a11857398
Sent from the Hadoop Users mailing list archive at Nabble.com.



Mime
View raw message