hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy <azury...@gmail.com>
Subject Re: Is Hadoop's TooRunner thread-safe?
Date Fri, 21 Mar 2014 20:53:53 GMT
Yes, this is the best way to go.

Sent from my iPhone5s

> On 2014年3月22日, at 3:03, Something Something <mailinglists19@gmail.com> wrote:
> 
> I will be happy to follow all these steps if someone confirms that this is the best way
to handle it.  Seems harmless to me, but just wondering.  Thanks.
> 
> 
>> On Fri, Mar 21, 2014 at 1:26 AM, Bertrand Dechoux <dechouxb@gmail.com> wrote:
>> JIRA, test, patch and review? I am sure the community would welcome it. And if you
don't, well, it is unlikely to be appear soon into hadoop trunk.
>> 
>> Bertrand
>> 
>> 
>>> On Fri, Mar 21, 2014 at 12:49 AM, Something Something <mailinglists19@gmail.com>
wrote:
>>> Confirmed that ToolRunner is NOT thread-safe:
>>> 
>>> Original code (which runs into problems):
>>> 
>>>   public static int run(Configuration conf, Tool tool, String[] args) 
>>>     throws Exception{
>>>     if(conf == null) {
>>>       conf = new Configuration();
>>>     }
>>>     GenericOptionsParser parser = new GenericOptionsParser(conf, args);
>>>     //set the configuration back, so that Tool can configure itself
>>>     tool.setConf(conf);
>>>     
>>>     //get the args w/o generic hadoop args
>>>     String[] toolArgs = parser.getRemainingArgs();
>>>     return tool.run(toolArgs);
>>>   }
>>> 
>>> 
>>> 
>>> 
>>> 
>>> New code (which works):
>>> 
>>>     public static int run(Configuration conf, Tool tool, String[] args)
>>>             throws Exception{
>>>         if(conf == null) {
>>>             conf = new Configuration();
>>>         }
>>>         GenericOptionsParser parser = getParser(conf, args);
>>> 
>>>         tool.setConf(conf);
>>> 
>>>         //get the args w/o generic hadoop args
>>>         String[] toolArgs = parser.getRemainingArgs();
>>>         return tool.run(toolArgs);
>>>     }
>>> 
>>>     private static synchronized GenericOptionsParser getParser(Configuration
conf, String[] args) throws Exception {
>>>         return new GenericOptionsParser(conf, args);
>>>     }
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On Wed, Mar 19, 2014 at 10:15 AM, Something Something <mailinglists19@gmail.com>
wrote:
>>>> I would like to trigger a few Hadoop jobs simultaneously.  I’ve created
a pool of threads using Executors.newFixedThreadPool.  Idea is that if the pool size is 2,
my code will trigger 2 Hadoop jobs at the same exact time using ‘ToolRunner.run’.  In
my testing, I noticed that these 2 threads keep stepping on each other.
>>>> 
>>>> When I looked under the hood, I noticed that ToolRunner creates GenericOptionsParser
which in turn calls a static method ‘buildGeneralOptions’.  This method uses ‘OptionBuilder.withArgName’
which uses an instance variable called, ‘argName’.  This doesn’t look thread safe to
me and I believe is the root cause of issues I am running into.
>>>> 
>>>> Any thoughts?
> 

Mime
View raw message