hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Why does the Hive CLI start a subprocess?
Date Thu, 10 Dec 2009 21:24:00 GMT
On Thu, Dec 10, 2009 at 3:52 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
> On Thu, Dec 10, 2009 at 3:08 PM, Philip Zeyliger <philip@cloudera.com> wrote:
>>> >
>>> > So definitely the subprocess has different libjars. But those libjars
>>> > are not needed by the CLI. Does that make sense?
>>>
>>
>> Sure.  In theory, though, you could do stuff like
>> "Thread.currentThread().setContextClassLoader(...)".  Hadoop does this on
>> occasion already.
>>
>> I'm curious if anyone's tried that approach and failed. :)
>>
>> -- Philip
>>
>
> Phillip,
>
>> "Thread.currentThread().setContextClassLoader(...)".
>
> Really there is no need to add jars to the Cli. The UDF never needs to
> be in the classpath of the CLI.
>
> If you look at the comment..
> /**
>  * Alternate implementation (to ExecDriver) of spawning a mapreduce
> task that runs it from
>  * a separate jvm. The primary issue with this is the inability to
> control logging from
>  * a separate jvm in a consistent manner
>  **/
>
> So this must have done this specifically. I am guessing that some
> Hadoop/hive singletons might involved and spanning a separate task
> gives you total isolation.
>

I think the closest thing we have talked about is this...

https://issues.apache.org/jira/browse/HIVE-744

This came about because I use

https://issues.apache.org/jira/browse/HIVE-617

to launch my jobs.

There is definitely some logic to spawning MR jobs in a single JVM. If
you can track them in a thread you have more control of them than an
external process.

Mime
View raw message