hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Engel <da...@istwok.net>
Subject Re: Problem adding jar using pyhs2
Date Mon, 28 Apr 2014 14:39:38 GMT
Thanks for your response.

We've essentially done your first suggestion in the past by copying or
symlinking our jar into Hive's lib directory.  It works, but we'd like
a better way for different users to to use different versions of our
jar during development.  Perhaps that's not possible, though, without
running completely differnt instances of Hive.

I don't think your second suggestion will work.  The original problem
is that when "add jar file.jar" is run through pyhs2, the fulle
command gets passed to AddResourceProcessor.run(), yet
AddResourceProcessor.run() is written such that it only expects "jar
file.jar" to get passed to it.  That's how it appears to work when
"add jar file.jar" is run from a stand-alone Hive CLI and from beeline.

David

On Sat, Apr 26, 2014 at 12:14:53AM -0700, Brad Ruderman wrote:
> An easy solution would be to add the jar to the classpath or auxlibs
> therefore every instance of hive already has the jar and you just need to
> create the temporary function.
> 
> Else you can put the JAR in HDFS and reference the add jar using the hdfs
> scheme. Example:
> 
> import pyhs2
> 
> with pyhs2.connect(host='127.0.0.1',
>                    port=10000,
>                    authMechanism="PLAIN",
>                    user='root',
>                    password='test',
>                    database='default') as conn:
>     with conn.cursor() as cur:
> cur.execute("ADD JAR hdfs://
> sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-SNAPSHOT.jar")
>  cur.execute("CREATE TEMPORARY FUNCTION substr AS
> 'com.nexr.platform.hive.udf.UDFSubstrForOracle'")
>     #Execute query
>         cur.execute("select substr(description,2,4) from sample_07")
> 
>         #Return column info from query
>         print cur.getSchema()
> 
>         #Fetch table results
>         for i in cur.fetch():
>             print i
> 
> 
> On Fri, Apr 25, 2014 at 7:54 AM, David Engel <david@istwok.net> wrote:
> 
> > Hi,
> >
> > I'm trying to convert some of our Hive queries to use the pyhs2 Python
> > package (https://github.com/BradRuderman/pyhs2).  Because we have our
> > own jar with some custom SerDes and UDFs, we need to use the "add jar
> > /path/to/my.jar" command to make them available to Hive.  This works
> > fine using the Hive CLI directly and also with the Beeline client.  It
> > doesn't work, however, with pyhs2.
> >
> > I naively tracked the problem down to a bug in
> > AddResourceProcessor.run().  See HIVE-6971 in Jira.  My attempted fix
> > turned out to not be correct because it breaks the "add" command when
> > used from the CLI and Beeline.  It seems the "add" part of any "add
> > file|jar|archive ..." command needs to get stripped off somewhere
> > before it gets passed to AddResourceProcessor.run().  Unfortunately, I
> > can't find that location when the command is received from pyhs2.  Can
> > someone help?
> >
> > David
> > --
> > David Engel
> > david@istwok.net
> >

-- 
David Engel
david@istwok.net

Mime
View raw message