hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Hive Class path libjars, auxjars, etc
Date Fri, 24 Jul 2009 16:42:25 GMT
I have been following some threads on the hadoop mailing list about
speeding up MR jobs. I have a few questions I am sure I can find the
answer to if I dig into the source code but I thought I could get a
quick answer.

1 ADD JAR 'myfile.jar'  uses the distributed cache. Using the
distributed cache has some overhead. I know if I create an auxlibs
directory under hive root, they will be added to libjars on startup.
If i add my jar to auxlibs on all my nodes will a UDF in the jar be
available during subsequent jobs? Or is it only necessary to add those
jars to the auxlib on the node I start the job from.

2 Dealing with the entire hive install. How much of the hive install
really needs to be replication on each datanode? If we used
distributed cache for everything the jobs would have unneeded
overhead, but hive would be 'installed on demand' from the client.


View raw message