hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Hive Class path libjars, auxjars, etc
Date Thu, 30 Jul 2009 18:54:58 GMT
On Fri, Jul 24, 2009 at 1:45 PM, Edward Capriolo<edlinuxguru@gmail.com> wrote:
> On Fri, Jul 24, 2009 at 1:36 PM, Zheng Shao<zshao9@gmail.com> wrote:
>> Hive only needs to be installed at the node that runs the hive query.
>> All the jars will be sent to the hadoop JobClient via -libjars. The
>> code is in ExecDriver.java.
>>
>> In hadoop 0.17, I don't think there is a way to add a path to
>> classpath for a job (unless we put it in hadoop-env.sh and start
>> TaskTracker with that path). are there any changes in the latter
>> versions?
>>
>>
>>
>> Zheng
>>
>>
>>
>> On 7/24/09, Edward Capriolo <edlinuxguru@gmail.com> wrote:
>>> I have been following some threads on the hadoop mailing list about
>>> speeding up MR jobs. I have a few questions I am sure I can find the
>>> answer to if I dig into the source code but I thought I could get a
>>> quick answer.
>>>
>>> 1 ADD JAR 'myfile.jar'  uses the distributed cache. Using the
>>> distributed cache has some overhead. I know if I create an auxlibs
>>> directory under hive root, they will be added to libjars on startup.
>>> If i add my jar to auxlibs on all my nodes will a UDF in the jar be
>>> available during subsequent jobs? Or is it only necessary to add those
>>> jars to the auxlib on the node I start the job from.
>>>
>>> 2 Dealing with the entire hive install. How much of the hive install
>>> really needs to be replication on each datanode? If we used
>>> distributed cache for everything the jobs would have unneeded
>>> overhead, but hive would be 'installed on demand' from the client.
>>>
>>> Thanks,
>>> Edward
>>>
>>
>> --
>> Sent from Gmail for mobile | mobile.google.com
>>
>> Yours,
>> Zheng
>>
>
> Zheng,
>
> A thread from the  hadoop list peaked my interest. search.
> "hadoop jobs take long time to setup"
>
> http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200906.mbox/%3C7e536b1f0906281408n1c2484bfve6dc1ea339110e9d@mail.gmail.com%3E
>
> Can hive benefit?
> Edward
>

Could we use something like this for a performance increase? With the
assumption that the jars are present on all task-trackers could we
have an alternate invocation script such as bin/hive-local ?

Edward

Mime
View raw message