hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasad Chakka <pcha...@facebook.com>
Subject Re: contrib modules for hive
Date Thu, 09 Apr 2009 18:48:02 GMT
I prefer giving options during install time rather than at runtime which can lead to bugs and
non-deterministic issues depending how a class loader works. It may be that a particular extension
is required only during runtime and not during parse time so class loader might know of this
extension (and the jar).

But giving options like below can be cumbersome if there are more than few extensions that
need to be installed. May be a separate config file or the jars that have been built might
be a better option.

From: Joydeep Sen Sarma <jssarma@facebook.com>
Reply-To: <hive-dev@hadoop.apache.org>
Date: Thu, 9 Apr 2009 11:36:41 -0700
To: <hive-dev@hadoop.apache.org>
Subject: contrib modules for hive

I am working on the tfiletransport stuff. it doesn't make sense to make stuff like this part
of standard hive distribution (in fact we could probably say that about the thrift serdes
already - since in all likelihood only FB uses them). it makes the stuff that we ship out
to distributed cache bigger and bigger (so there's a real cost here).

Breaking out custom formats and serdes into contrib. directories is easy. I am less sure about
how to deal with their jar files. One simple option is that we can provide an option in the
package command to optionally package jars from specific contrib. modules into auxlib in the
distribution tree. Something like:

Ant -Dpackage.contrib.tfiletransport=true package

A more sophisticated route would be to have our own classloader that would automatically figure
out what jars are required and only bring them into the working environment and ship to distributed
cache. Would require more work.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message