hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Mitic" <iva...@microsoft.com>
Subject Re: Review Request 22329: HIVE-7190. WebHCat launcher task failure can cause two concurent user jobs to run
Date Mon, 09 Jun 2014 21:21:46 GMT


> On June 7, 2014, 1:05 a.m., Eugene Koifman wrote:
> > 1. I think webhcat-default.xml should be modified to include the jars that are now
required in templeton.libjars to minimize out-of-the-box config for end users.
> > 2. Is there any test (e2e) that can be added for this? (with reasonable amount of
effort)
> > 3. When you tested that Pig/Hive jobs get properly tagged, you mean you tested that
MR jobs that are generated by Pig/Hive are tagged, correct?
> 
> Eugene Koifman wrote:
>     4. Actually, instead of doing 1, could WebHCat dynamically figure out which hadoop
version it's talking to and add only the necessary shim jar, rather than shipping all of them?
 It reduces the amount of config needed.  It would also be better if we can only ship the
minimal set of jars.
>

1. I like your proposal from #4. I actually started this route but run into some issues when
I tried to add libjars programmatically. Let me try harder and I'll reply back. 
2. Will have to check out what we have currently.
3. Correct, I validated that MR jobs generated by Pig/Hive are tagged properly. 


> On June 7, 2014, 1:05 a.m., Eugene Koifman wrote:
> > hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java,
line 44
> > <https://reviews.apache.org/r/22329/diff/1/?file=604984#file604984line44>
> >
> >     I think it would be useful to add a more detailed description of these props.
 Something like what is in the JIRA ticket.  I would have added the ticket number to the comment,
but Hive prohibits that.

Will fix this, thanks


> On June 7, 2014, 1:05 a.m., Eugene Koifman wrote:
> > hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java,
line 126
> > <https://reviews.apache.org/r/22329/diff/1/?file=604985#file604985line126>
> >
> >     Which user will this use?  Is it the user running WebHCat or the value of 'doAs'
parameter?

This is running in the context of the task itself. In unsecure hadoop this is in the same
context as nodemanager/tasktracker. In secure hadoop I believe this is in the context of the
user submitting the job.


> On June 7, 2014, 1:05 a.m., Eugene Koifman wrote:
> > shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java, line 157
> > <https://reviews.apache.org/r/22329/diff/1/?file=604987#file604987line157>
> >
> >     Is LOG.info() the right log level?  Seems like it will pollute the log file.

I think this is totally fine, it's just a single entry in the task syslog. This is super useful
info (IMO must have) for users to understand what templeton launcher job does.


> On June 7, 2014, 1:05 a.m., Eugene Koifman wrote:
> > shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java, line 160
> > <https://reviews.apache.org/r/22329/diff/1/?file=604987#file604987line160>
> >
> >     Is LOG.info() the right level?

I think this is ok.


> On June 7, 2014, 1:05 a.m., Eugene Koifman wrote:
> > shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java, line 189
> > <https://reviews.apache.org/r/22329/diff/1/?file=604987#file604987line189>
> >
> >     log level

Same as above, I think this is ok. 


- Ivan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22329/#review44992
-----------------------------------------------------------


On June 6, 2014, 10:02 p.m., Ivan Mitic wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22329/
> -----------------------------------------------------------
> 
> (Updated June 6, 2014, 10:02 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Approach in the patch is similar to what Oozie does to handle this situation. Specifically,
all child map jobs get tagged with the launcher MR job id. On launcher task restart, launcher
queries RM for the list of jobs that have the tag and kills them. After that it moves on to
start the same child job again. Again, similarly to what Oozie does, a new templeton.job.launch.time
property is introduced that captures the launcher job submit timestamp and later used to reduce
the search window when RM is queried. 
> 
> To validate the patch, you will need to add webhcat shim jars to templeton.libjars as
now webhcat launcher also has a dependency on hadoop shims. 
> 
> I have noticed that in case of the SqoopDelegator webhcat currently does not set the
MR delegation token when optionsFile flag is used. This also creates the problem in this scenario.
This looks like something that should be handled via a separate Jira.
> 
> 
> Diffs
> -----
> 
>   hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/HiveDelegator.java
23b1c4f 
>   hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JarDelegator.java
41b1dc5 
>   hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LauncherDelegator.java
04a5c6f 
>   hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/PigDelegator.java
04e061d 
>   hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/SqoopDelegator.java
adcd917 
>   hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/JobSubmissionConstants.java
a6355a6 
>   hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/tool/LaunchMapper.java
556ee62 
>   shims/0.20S/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim20S.java d3552c1 
>   shims/0.23/src/main/java/org/apache/hadoop/mapred/WebHCatJTShim23.java 5a728b2 
>   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 299e918 
> 
> Diff: https://reviews.apache.org/r/22329/diff/
> 
> 
> Testing
> -------
> 
> I have validated that MR, Pig and Hive jobs do get tagged appropriately. I have also
validated that previous child jobs do get killed on RM failover/task failure.
> 
> 
> Thanks,
> 
> Ivan Mitic
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message