hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <ja...@attributor.com>
Subject Re: Reusing jobs
Date Fri, 18 Apr 2008 15:55:04 GMT
When there are non daemon threads, JMX threads being our #1 cause, the 
jvm will not exit with out help.

This is in TaskTracker.java,

in 0.16.0, this is line 2088, in the finally clause of Child.main

        LogManager.shutdown();
        System.exit( 0 );       // Force the jvm to exit even if it has 
threads still running, this prevents memory expensive jvms being left around


Devaraj Das wrote:
> Jason, didn't get that. The jvm should exit naturally even without calling
> System.exit. Where exactly did you insert the System.exit?  Please clarify.
> Thanks! 
>
>   
>> -----Original Message-----
>> From: Jason Venner [mailto:jason@attributor.com] 
>> Sent: Friday, April 18, 2008 6:48 PM
>> To: core-user@hadoop.apache.org
>> Subject: Re: Reusing jobs
>>
>> We have terrible issues with threads in the JVM's holding 
>> down resources and causing the compute nodes to run out of 
>> memory and lock up. We in fact patch the JobTracker to cause 
>> the mapper/reduce jvm to System.exit, to ensure that the 
>> resources are freed.
>>
>> This is particularly a problem for mapper/reducers that 
>> enable jmx or spool off many threads for internal processing.
>>
>> Our solution is to tune the input split size so that the 
>> minimum mapper time is > 1 minute
>>
>> Karl Wettin wrote:
>>     
>>> Ted Dunning skrev:
>>>       
>>>> Hadoop has enormous startup costs that are relatively 
>>>>         
>> inherent in the 
>>     
>>>> current design.
>>>>
>>>> Most notably, mappers and reducers are executed in a 
>>>>         
>> standalone JVM 
>>     
>>>> (ostensibly for safety reasons).
>>>>         
>>> Is it possible to hack in support to reuse JVMs? Keep it 
>>>       
>> alive until 
>>     
>>> timed out and have it execute the jobs by opening a socket and say 
>>> hello? What classes should I start looking in? Could be a 
>>>       
>> fun exercise.
>>     
>>>           karl
>>>
>>>
>>>
>>>       
>>>>
>>>> On 4/17/08 6:00 PM, "Karl Wettin" <karl.wettin@gmail.com> wrote:
>>>>
>>>>         
>>>>> Is it possible to execute a job more than once?
>>>>>
>>>>> I use map reduce when adding a new instance to a 
>>>>>           
>> hierarchial cluster 
>>     
>>>>> tree. It finds the least distant node and inserts the new 
>>>>>           
>> instance 
>>     
>>>>> as a sibling to that node.
>>>>>
>>>>> As far as I know it is in very the nature of this 
>>>>>           
>> algorithm that one 
>>     
>>>>> inserts one instance at a time, that this is how the second 
>>>>> dimension is created that makes it better than a vector 
>>>>>           
>> cluster. It 
>>     
>>>>> would be possible to map all permutations of instances 
>>>>>           
>> and skip the 
>>     
>>>>> reduction, but that would result in many more calulations than 
>>>>> iteratively training the tree as the latter only require 
>>>>>           
>> one to test 
>>     
>>>>> against the instances already inserted to the tree.
>>>>>
>>>>> Iteratively training this tree using Hadoop means 
>>>>>           
>> executing one job 
>>     
>>>>> per instance that measure distance to all instances in a 
>>>>>           
>> file that I 
>>     
>>>>> also append the new instance to once inserted in the tree.
>>>>>
>>>>> All of above is very inefficient, especially with a young 
>>>>>           
>> tree that 
>>     
>>>>> could be trained in nanoseconds locally. So I do that 
>>>>>           
>> until it takes 
>>     
>>>>> 20 seconds to insert an instance.
>>>>>
>>>>> But really, this is all Hadoop framework overhead. I'm not quite 
>>>>> sure of all it does when I execute a job, but it seems 
>>>>>           
>> like quite a 
>>     
>>>>> lot. And all I'm doing is executing a couple of identical 
>>>>>           
>> jobs over 
>>     
>>>>> and over again using new data.
>>>>>
>>>>> It would be very nice if I it just took a few 
>>>>>           
>> milliseconds to do that.
>>     
>>>>>        karl
>>>>>           
>
>   
-- 
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message