hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Why does the Hive CLI start a subprocess?
Date Thu, 10 Dec 2009 19:39:00 GMT
Phillip,

The task that is

On Thu, Dec 10, 2009 at 1:00 PM, Ning Zhang <nzhang@facebook.com> wrote:
> The cmdLine is calling the shell script hadoop, so I guess it is a better isolation from
different hadoop versions.  Just my thought.
>
> On Dec 10, 2009, at 9:51 AM, Philip Zeyliger wrote:
>
>> Anyone?
>>
>> On Wed, Dec 2, 2009 at 5:27 PM, Philip Zeyliger <philip@cloudera.com> wrote:
>>
>>> Hi folks,
>>>
>>> I notice that Hive's hive.ql.exec.MapRedTask calls out to a subprocess
>>> ("executor = Runtime.getRuntime().exec(cmdLine);") to run MR tasks.
>>> Out of curiosity, what's the motivation?  It seems (naively, I'm sure)
>>> that you could start the MR from within the same JVM.
>>>
>>> Thanks,
>>>
>>> -- Philip
>>>
>
>
Phillip,

I am not very well versed with this section of the codebase, but I
think the biggest reason, may because the classpath of the Task is not
the classpath of the parent.

If you look a little above your line..
 executor = Runtime.getRuntime().exec(cmdLine);

You see stuff like:

 if(ShimLoader.getHadoopShims().usesJobShell()) {
        jarCmd = libJarsOption + hiveJar + " " + ExecDriver.class.getName();
      } else {
        jarCmd = hiveJar + " " + ExecDriver.class.getName() + libJarsOption;
      }

      String cmdLine = hadoopExec + " jar " + jarCmd +
        " -plan " + planFile.toString() + " " + isSilent + " " + hiveConfArgs;

So definitely the subprocess has different libjars. But those libjars
are not needed by the CLI. Does that make sense?

Edward

Mime
View raw message