Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (athena.apache.org: domain of bruderman@radiumone.com
 designates 74.125.82.42 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <20140428143938.GA6871@opus.istwok.net>
References: <20140425145451.GB19644@opus.istwok.net>
 <CANFGO9qvY9darQJbF9QDWJBRWwf_S9w3cz3A6bzM6EH=580hnQ@mail.gmail.com>
 <20140428143938.GA6871@opus.istwok.net>
From: Brad Ruderman <bruderman@radiumone.com>
Date: Mon, 28 Apr 2014 10:22:57 -0700
Message-ID: 
 <CANFGO9rGtR21meSSe7OH65OQq8MccO9AiZ4yZY=XyRGpyz7ysg@mail.gmail.com>
Subject: Re: Problem adding jar using pyhs2
To: user@hive.apache.org
Content-Type: multipart/alternative; boundary=047d7bacb40e3371d104f81d9295

--047d7bacb40e3371d104f81d9295
Content-Type: text/plain; charset=ISO-8859-1

Hi David-
Can you test the code? It is working for me. Make sure your jar is in HDFS
and you are using the FQDN for referencing it.

import pyhs2

with pyhs2.connect(host='127.0.0.1',
                   port=10000,
                   authMechanism="PLAIN",
                   user='root',
                   password='test',
                   database='default') as conn:
    with conn.cursor() as cur:
cur.execute("ADD JAR hdfs://
sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-SNAPSHOT.jar")
 cur.execute("CREATE TEMPORARY FUNCTION substr AS
'com.nexr.platform.hive.udf.UDFSubstrForOracle'")
     #Execute query
        cur.execute("select substr(description,2,4) from sample_07")

        #Return column info from query
        print cur.getSchema()

        #Fetch table results
        for i in cur.fetch():
            print i

Thanks,
Brad


On Mon, Apr 28, 2014 at 7:39 AM, David Engel <david@istwok.net> wrote:

> Thanks for your response.
>
> We've essentially done your first suggestion in the past by copying or
> symlinking our jar into Hive's lib directory.  It works, but we'd like
> a better way for different users to to use different versions of our
> jar during development.  Perhaps that's not possible, though, without
> running completely differnt instances of Hive.
>
> I don't think your second suggestion will work.  The original problem
> is that when "add jar file.jar" is run through pyhs2, the fulle
> command gets passed to AddResourceProcessor.run(), yet
> AddResourceProcessor.run() is written such that it only expects "jar
> file.jar" to get passed to it.  That's how it appears to work when
> "add jar file.jar" is run from a stand-alone Hive CLI and from beeline.
>
> David
>
> On Sat, Apr 26, 2014 at 12:14:53AM -0700, Brad Ruderman wrote:
> > An easy solution would be to add the jar to the classpath or auxlibs
> > therefore every instance of hive already has the jar and you just need to
> > create the temporary function.
> >
> > Else you can put the JAR in HDFS and reference the add jar using the hdfs
> > scheme. Example:
> >
> > import pyhs2
> >
> > with pyhs2.connect(host='127.0.0.1',
> >                    port=10000,
> >                    authMechanism="PLAIN",
> >                    user='root',
> >                    password='test',
> >                    database='default') as conn:
> >     with conn.cursor() as cur:
> > cur.execute("ADD JAR hdfs://
> > sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-SNAPSHOT.jar")
> >  cur.execute("CREATE TEMPORARY FUNCTION substr AS
> > 'com.nexr.platform.hive.udf.UDFSubstrForOracle'")
> >     #Execute query
> >         cur.execute("select substr(description,2,4) from sample_07")
> >
> >         #Return column info from query
> >         print cur.getSchema()
> >
> >         #Fetch table results
> >         for i in cur.fetch():
> >             print i
> >
> >
> > On Fri, Apr 25, 2014 at 7:54 AM, David Engel <david@istwok.net> wrote:
> >
> > > Hi,
> > >
> > > I'm trying to convert some of our Hive queries to use the pyhs2 Python
> > > package (https://github.com/BradRuderman/pyhs2).  Because we have our
> > > own jar with some custom SerDes and UDFs, we need to use the "add jar
> > > /path/to/my.jar" command to make them available to Hive.  This works
> > > fine using the Hive CLI directly and also with the Beeline client.  It
> > > doesn't work, however, with pyhs2.
> > >
> > > I naively tracked the problem down to a bug in
> > > AddResourceProcessor.run().  See HIVE-6971 in Jira.  My attempted fix
> > > turned out to not be correct because it breaks the "add" command when
> > > used from the CLI and Beeline.  It seems the "add" part of any "add
> > > file|jar|archive ..." command needs to get stripped off somewhere
> > > before it gets passed to AddResourceProcessor.run().  Unfortunately, I
> > > can't find that location when the command is received from pyhs2.  Can
> > > someone help?
> > >
> > > David
> > > --
> > > David Engel
> > > david@istwok.net
> > >
>
> --
> David Engel
> david@istwok.net
>

--047d7bacb40e3371d104f81d9295
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi David-<div>Can you test the code? It is working for me.=
 Make sure your jar is in HDFS and you are using the FQDN for referencing i=
t.</div><div><br></div><div><div style=3D"font-family:arial,sans-serif;font=
-size:13px">

import pyhs2</div><div style=3D"font-family:arial,sans-serif;font-size:13px=
"><br></div><div style=3D"font-family:arial,sans-serif;font-size:13px">with=
 pyhs2.connect(host=3D&#39;127.0.0.1&#39;,</div><div style=3D"font-family:a=
rial,sans-serif;font-size:13px">

=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0port=3D10000,</div><div style=3D"fon=
t-family:arial,sans-serif;font-size:13px">=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0authMechanism=3D&quot;PLAIN&quot;,</div><div style=3D"font-family:ar=
ial,sans-serif;font-size:13px">=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0user=
=3D&#39;root&#39;,</div>

<div style=3D"font-family:arial,sans-serif;font-size:13px">=A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0password=3D&#39;test&#39;,</div><div style=3D"font-f=
amily:arial,sans-serif;font-size:13px">=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0database=3D&#39;default&#39;) as conn:</div>

<div style=3D"font-family:arial,sans-serif;font-size:13px">=A0 =A0 with con=
n.cursor() as cur:</div><div style=3D"font-family:arial,sans-serif;font-siz=
e:13px"><span style=3D"white-space:pre-wrap">	</span>cur.execute(&quot;ADD =
JAR hdfs://<a href=3D"http://sandbox.hortonworks.com:8020/nexr-hive-udf-0.2=
-SNAPSHOT.jar" target=3D"_blank">sandbox.hortonworks.com:8020/nexr-hive-udf=
-0.2-SNAPSHOT.jar</a>&quot;)</div>

<div style=3D"font-family:arial,sans-serif;font-size:13px"><span style=3D"w=
hite-space:pre-wrap">	</span>cur.execute(&quot;CREATE TEMPORARY FUNCTION su=
bstr AS &#39;com.nexr.platform.hive.udf.UDFSubstrForOracle&#39;&quot;)</div=
>

<div style=3D"font-family:arial,sans-serif;font-size:13px">=A0 =A0=A0<span =
style=3D"white-space:pre-wrap">	</span>#Execute query</div><div style=3D"fo=
nt-family:arial,sans-serif;font-size:13px">=A0 =A0 =A0 =A0 cur.execute(&quo=
t;select substr(description,2,4) from sample_07&quot;)</div>

<div style=3D"font-family:arial,sans-serif;font-size:13px"><br></div><div s=
tyle=3D"font-family:arial,sans-serif;font-size:13px">=A0 =A0 =A0 =A0 #Retur=
n column info from query</div><div style=3D"font-family:arial,sans-serif;fo=
nt-size:13px">

=A0 =A0 =A0 =A0 print cur.getSchema()</div><div style=3D"font-family:arial,=
sans-serif;font-size:13px"><br></div><div style=3D"font-family:arial,sans-s=
erif;font-size:13px">=A0 =A0 =A0 =A0 #Fetch table results</div><div style=
=3D"font-family:arial,sans-serif;font-size:13px">

=A0 =A0 =A0 =A0 for i in cur.fetch():</div><div style=3D"font-family:arial,=
sans-serif;font-size:13px">=A0 =A0 =A0 =A0 =A0 =A0 print i</div></div><div>=
<br></div><div>Thanks,</div><div>Brad</div></div><div class=3D"gmail_extra"=
><br><br><div class=3D"gmail_quote">

On Mon, Apr 28, 2014 at 7:39 AM, David Engel <span dir=3D"ltr">&lt;<a href=
=3D"mailto:david@istwok.net" target=3D"_blank">david@istwok.net</a>&gt;</sp=
an> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;=
border-left:1px #ccc solid;padding-left:1ex">

Thanks for your response.<br>
<br>
We&#39;ve essentially done your first suggestion in the past by copying or<=
br>
symlinking our jar into Hive&#39;s lib directory. =A0It works, but we&#39;d=
 like<br>
a better way for different users to to use different versions of our<br>
jar during development. =A0Perhaps that&#39;s not possible, though, without=
<br>
running completely differnt instances of Hive.<br>
<br>
I don&#39;t think your second suggestion will work. =A0The original problem=
<br>
is that when &quot;add jar file.jar&quot; is run through pyhs2, the fulle<b=
r>
command gets passed to AddResourceProcessor.run(), yet<br>
AddResourceProcessor.run() is written such that it only expects &quot;jar<b=
r>
file.jar&quot; to get passed to it. =A0That&#39;s how it appears to work wh=
en<br>
&quot;add jar file.jar&quot; is run from a stand-alone Hive CLI and from be=
eline.<br>
<br>
David<br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
On Sat, Apr 26, 2014 at 12:14:53AM -0700, Brad Ruderman wrote:<br>
&gt; An easy solution would be to add the jar to the classpath or auxlibs<b=
r>
&gt; therefore every instance of hive already has the jar and you just need=
 to<br>
&gt; create the temporary function.<br>
&gt;<br>
&gt; Else you can put the JAR in HDFS and reference the add jar using the h=
dfs<br>
&gt; scheme. Example:<br>
&gt;<br>
&gt; import pyhs2<br>
&gt;<br>
&gt; with pyhs2.connect(host=3D&#39;127.0.0.1&#39;,<br>
&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0port=3D10000,<br>
&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0authMechanism=3D&quot;PLAIN&quo=
t;,<br>
&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0user=3D&#39;root&#39;,<br>
&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0password=3D&#39;test&#39;,<br>
&gt; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0database=3D&#39;default&#39;) a=
s conn:<br>
&gt; =A0 =A0 with conn.cursor() as cur:<br>
&gt; cur.execute(&quot;ADD JAR hdfs://<br>
&gt; <a href=3D"http://sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-SNAPS=
HOT.jar" target=3D"_blank">sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-S=
NAPSHOT.jar</a>&quot;)<br>
&gt; =A0cur.execute(&quot;CREATE TEMPORARY FUNCTION substr AS<br>
&gt; &#39;com.nexr.platform.hive.udf.UDFSubstrForOracle&#39;&quot;)<br>
&gt; =A0 =A0 #Execute query<br>
&gt; =A0 =A0 =A0 =A0 cur.execute(&quot;select substr(description,2,4) from =
sample_07&quot;)<br>
&gt;<br>
&gt; =A0 =A0 =A0 =A0 #Return column info from query<br>
&gt; =A0 =A0 =A0 =A0 print cur.getSchema()<br>
&gt;<br>
&gt; =A0 =A0 =A0 =A0 #Fetch table results<br>
&gt; =A0 =A0 =A0 =A0 for i in cur.fetch():<br>
&gt; =A0 =A0 =A0 =A0 =A0 =A0 print i<br>
&gt;<br>
&gt;<br>
&gt; On Fri, Apr 25, 2014 at 7:54 AM, David Engel &lt;<a href=3D"mailto:dav=
id@istwok.net">david@istwok.net</a>&gt; wrote:<br>
&gt;<br>
&gt; &gt; Hi,<br>
&gt; &gt;<br>
&gt; &gt; I&#39;m trying to convert some of our Hive queries to use the pyh=
s2 Python<br>
&gt; &gt; package (<a href=3D"https://github.com/BradRuderman/pyhs2" target=
=3D"_blank">https://github.com/BradRuderman/pyhs2</a>). =A0Because we have =
our<br>
&gt; &gt; own jar with some custom SerDes and UDFs, we need to use the &quo=
t;add jar<br>
&gt; &gt; /path/to/my.jar&quot; command to make them available to Hive. =A0=
This works<br>
&gt; &gt; fine using the Hive CLI directly and also with the Beeline client=
. =A0It<br>
&gt; &gt; doesn&#39;t work, however, with pyhs2.<br>
&gt; &gt;<br>
&gt; &gt; I naively tracked the problem down to a bug in<br>
&gt; &gt; AddResourceProcessor.run(). =A0See HIVE-6971 in Jira. =A0My attem=
pted fix<br>
&gt; &gt; turned out to not be correct because it breaks the &quot;add&quot=
; command when<br>
&gt; &gt; used from the CLI and Beeline. =A0It seems the &quot;add&quot; pa=
rt of any &quot;add<br>
&gt; &gt; file|jar|archive ...&quot; command needs to get stripped off some=
where<br>
&gt; &gt; before it gets passed to AddResourceProcessor.run(). =A0Unfortuna=
tely, I<br>
&gt; &gt; can&#39;t find that location when the command is received from py=
hs2. =A0Can<br>
&gt; &gt; someone help?<br>
&gt; &gt;<br>
&gt; &gt; David<br>
&gt; &gt; --<br>
&gt; &gt; David Engel<br>
&gt; &gt; <a href=3D"mailto:david@istwok.net">david@istwok.net</a><br>
&gt; &gt;<br>
<br>
</div></div><span class=3D"HOEnZb"><font color=3D"#888888">--<br>
David Engel<br>
<a href=3D"mailto:david@istwok.net">david@istwok.net</a><br>
</font></span></blockquote></div><br></div>

--047d7bacb40e3371d104f81d9295--