hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Engel <da...@istwok.net>
Subject Re: Problem adding jar using pyhs2
Date Tue, 29 Apr 2014 16:52:07 GMT
Hi Brad,

Your test, after edting for local host/file names, etc. worked.  It
must be something else I'm doing wrong in my development stuff.  At
least I know it should work.  I'll figure it out eventually.  Thanks
again.

David

On Mon, Apr 28, 2014 at 10:22:57AM -0700, Brad Ruderman wrote:
> Hi David-
> Can you test the code? It is working for me. Make sure your jar is in HDFS
> and you are using the FQDN for referencing it.
> 
> import pyhs2
> 
> with pyhs2.connect(host='127.0.0.1',
>                    port=10000,
>                    authMechanism="PLAIN",
>                    user='root',
>                    password='test',
>                    database='default') as conn:
>     with conn.cursor() as cur:
> cur.execute("ADD JAR hdfs://
> sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-SNAPSHOT.jar")
>  cur.execute("CREATE TEMPORARY FUNCTION substr AS
> 'com.nexr.platform.hive.udf.UDFSubstrForOracle'")
>      #Execute query
>         cur.execute("select substr(description,2,4) from sample_07")
> 
>         #Return column info from query
>         print cur.getSchema()
> 
>         #Fetch table results
>         for i in cur.fetch():
>             print i
> 
> Thanks,
> Brad
> 
> 
> On Mon, Apr 28, 2014 at 7:39 AM, David Engel <david@istwok.net> wrote:
> 
> > Thanks for your response.
> >
> > We've essentially done your first suggestion in the past by copying or
> > symlinking our jar into Hive's lib directory.  It works, but we'd like
> > a better way for different users to to use different versions of our
> > jar during development.  Perhaps that's not possible, though, without
> > running completely differnt instances of Hive.
> >
> > I don't think your second suggestion will work.  The original problem
> > is that when "add jar file.jar" is run through pyhs2, the fulle
> > command gets passed to AddResourceProcessor.run(), yet
> > AddResourceProcessor.run() is written such that it only expects "jar
> > file.jar" to get passed to it.  That's how it appears to work when
> > "add jar file.jar" is run from a stand-alone Hive CLI and from beeline.
> >
> > David
> >
> > On Sat, Apr 26, 2014 at 12:14:53AM -0700, Brad Ruderman wrote:
> > > An easy solution would be to add the jar to the classpath or auxlibs
> > > therefore every instance of hive already has the jar and you just need to
> > > create the temporary function.
> > >
> > > Else you can put the JAR in HDFS and reference the add jar using the hdfs
> > > scheme. Example:
> > >
> > > import pyhs2
> > >
> > > with pyhs2.connect(host='127.0.0.1',
> > >                    port=10000,
> > >                    authMechanism="PLAIN",
> > >                    user='root',
> > >                    password='test',
> > >                    database='default') as conn:
> > >     with conn.cursor() as cur:
> > > cur.execute("ADD JAR hdfs://
> > > sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-SNAPSHOT.jar")
> > >  cur.execute("CREATE TEMPORARY FUNCTION substr AS
> > > 'com.nexr.platform.hive.udf.UDFSubstrForOracle'")
> > >     #Execute query
> > >         cur.execute("select substr(description,2,4) from sample_07")
> > >
> > >         #Return column info from query
> > >         print cur.getSchema()
> > >
> > >         #Fetch table results
> > >         for i in cur.fetch():
> > >             print i
> > >
> > >
> > > On Fri, Apr 25, 2014 at 7:54 AM, David Engel <david@istwok.net> wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm trying to convert some of our Hive queries to use the pyhs2 Python
> > > > package (https://github.com/BradRuderman/pyhs2).  Because we have our
> > > > own jar with some custom SerDes and UDFs, we need to use the "add jar
> > > > /path/to/my.jar" command to make them available to Hive.  This works
> > > > fine using the Hive CLI directly and also with the Beeline client.  It
> > > > doesn't work, however, with pyhs2.
> > > >
> > > > I naively tracked the problem down to a bug in
> > > > AddResourceProcessor.run().  See HIVE-6971 in Jira.  My attempted fix
> > > > turned out to not be correct because it breaks the "add" command when
> > > > used from the CLI and Beeline.  It seems the "add" part of any "add
> > > > file|jar|archive ..." command needs to get stripped off somewhere
> > > > before it gets passed to AddResourceProcessor.run().  Unfortunately, I
> > > > can't find that location when the command is received from pyhs2.  Can
> > > > someone help?
> > > >
> > > > David
> > > > --
> > > > David Engel
> > > > david@istwok.net
> > > >
> >
> > --
> > David Engel
> > david@istwok.net
> >

-- 
David Engel
david@istwok.net

Mime
View raw message