Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3AF91FA85 for ; Mon, 28 Apr 2014 17:23:46 +0000 (UTC) Received: (qmail 65523 invoked by uid 500); 28 Apr 2014 17:23:43 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 65472 invoked by uid 500); 28 Apr 2014 17:23:42 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 65464 invoked by uid 99); 28 Apr 2014 17:23:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Apr 2014 17:23:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bruderman@radiumone.com designates 74.125.82.42 as permitted sender) Received: from [74.125.82.42] (HELO mail-wg0-f42.google.com) (74.125.82.42) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Apr 2014 17:23:39 +0000 Received: by mail-wg0-f42.google.com with SMTP id k14so3979233wgh.13 for ; Mon, 28 Apr 2014 10:23:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=radiumone.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=Q+Q1zL3ai+DegtCElkwLnZrhB2FLyq9kUCP5eHG0lhY=; b=PZucWI/lpk/BAu5mghiy5IcjNdbj+StcozqFiuMG/yNotmpO3Pt3JEaXy5fv6BGVRz zqlA1MqR4F0KBsPGoQ11p24qsdWO2QCWFV89Ts9PM1EQ6ybeX5+lqnD8la7Svefdzwfk WMd/uVFspVRsH8NT+FHBJjD3RlcbpT0HWM+Es= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=Q+Q1zL3ai+DegtCElkwLnZrhB2FLyq9kUCP5eHG0lhY=; b=G/RSkxcQWDMfDtt5cpSPhu9PnFAL19De58cLEHSf9DsVZi7t1g7wp+wLuC+8w+3RKk suvAwTWgVK/k38tiUQduKodNoVeDw0cEXGtRZrj4PJKbPVIhg9bhwwbiVM5sYV5BWxg/ K1EZdHkAWD731wfOt6s5NDbbVMxRtys1zFahnmgxC1hYH8PiPGmYdWvwIv/vzD47tV/o J0iT0W2xTbDEE2bd7Kmir+BIUG/ZaSFkM4MdRNr12eL0QFmP6147Jj/Hr7e4plLyYRNh 1M8X+hk7HY7ppLJC6I+v6Ipk6DITMYC0Rq3dPKm1SP+9FWJ7aiazZYv2bU/wEViLFVx/ 937w== X-Gm-Message-State: ALoCoQmVZGP3POqOn9Yy2dzCKh0u6UFxnUrMXgLN5zuemYj0v4FW7BM2s3oKZx7AQOjHY+yu8+dn X-Received: by 10.194.60.4 with SMTP id d4mr20285544wjr.28.1398705797420; Mon, 28 Apr 2014 10:23:17 -0700 (PDT) MIME-Version: 1.0 Received: by 10.217.138.136 with HTTP; Mon, 28 Apr 2014 10:22:57 -0700 (PDT) In-Reply-To: <20140428143938.GA6871@opus.istwok.net> References: <20140425145451.GB19644@opus.istwok.net> <20140428143938.GA6871@opus.istwok.net> From: Brad Ruderman Date: Mon, 28 Apr 2014 10:22:57 -0700 Message-ID: Subject: Re: Problem adding jar using pyhs2 To: user@hive.apache.org Content-Type: multipart/alternative; boundary=047d7bacb40e3371d104f81d9295 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bacb40e3371d104f81d9295 Content-Type: text/plain; charset=ISO-8859-1 Hi David- Can you test the code? It is working for me. Make sure your jar is in HDFS and you are using the FQDN for referencing it. import pyhs2 with pyhs2.connect(host='127.0.0.1', port=10000, authMechanism="PLAIN", user='root', password='test', database='default') as conn: with conn.cursor() as cur: cur.execute("ADD JAR hdfs:// sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-SNAPSHOT.jar") cur.execute("CREATE TEMPORARY FUNCTION substr AS 'com.nexr.platform.hive.udf.UDFSubstrForOracle'") #Execute query cur.execute("select substr(description,2,4) from sample_07") #Return column info from query print cur.getSchema() #Fetch table results for i in cur.fetch(): print i Thanks, Brad On Mon, Apr 28, 2014 at 7:39 AM, David Engel wrote: > Thanks for your response. > > We've essentially done your first suggestion in the past by copying or > symlinking our jar into Hive's lib directory. It works, but we'd like > a better way for different users to to use different versions of our > jar during development. Perhaps that's not possible, though, without > running completely differnt instances of Hive. > > I don't think your second suggestion will work. The original problem > is that when "add jar file.jar" is run through pyhs2, the fulle > command gets passed to AddResourceProcessor.run(), yet > AddResourceProcessor.run() is written such that it only expects "jar > file.jar" to get passed to it. That's how it appears to work when > "add jar file.jar" is run from a stand-alone Hive CLI and from beeline. > > David > > On Sat, Apr 26, 2014 at 12:14:53AM -0700, Brad Ruderman wrote: > > An easy solution would be to add the jar to the classpath or auxlibs > > therefore every instance of hive already has the jar and you just need to > > create the temporary function. > > > > Else you can put the JAR in HDFS and reference the add jar using the hdfs > > scheme. Example: > > > > import pyhs2 > > > > with pyhs2.connect(host='127.0.0.1', > > port=10000, > > authMechanism="PLAIN", > > user='root', > > password='test', > > database='default') as conn: > > with conn.cursor() as cur: > > cur.execute("ADD JAR hdfs:// > > sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-SNAPSHOT.jar") > > cur.execute("CREATE TEMPORARY FUNCTION substr AS > > 'com.nexr.platform.hive.udf.UDFSubstrForOracle'") > > #Execute query > > cur.execute("select substr(description,2,4) from sample_07") > > > > #Return column info from query > > print cur.getSchema() > > > > #Fetch table results > > for i in cur.fetch(): > > print i > > > > > > On Fri, Apr 25, 2014 at 7:54 AM, David Engel wrote: > > > > > Hi, > > > > > > I'm trying to convert some of our Hive queries to use the pyhs2 Python > > > package (https://github.com/BradRuderman/pyhs2). Because we have our > > > own jar with some custom SerDes and UDFs, we need to use the "add jar > > > /path/to/my.jar" command to make them available to Hive. This works > > > fine using the Hive CLI directly and also with the Beeline client. It > > > doesn't work, however, with pyhs2. > > > > > > I naively tracked the problem down to a bug in > > > AddResourceProcessor.run(). See HIVE-6971 in Jira. My attempted fix > > > turned out to not be correct because it breaks the "add" command when > > > used from the CLI and Beeline. It seems the "add" part of any "add > > > file|jar|archive ..." command needs to get stripped off somewhere > > > before it gets passed to AddResourceProcessor.run(). Unfortunately, I > > > can't find that location when the command is received from pyhs2. Can > > > someone help? > > > > > > David > > > -- > > > David Engel > > > david@istwok.net > > > > > -- > David Engel > david@istwok.net > --047d7bacb40e3371d104f81d9295 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi David-
Can you test the code? It is working for me.= Make sure your jar is in HDFS and you are using the FQDN for referencing i= t.

import pyhs2

with= pyhs2.connect(host=3D'127.0.0.1',
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0port=3D10000,
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0authMechanism=3D"PLAIN",
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0user= =3D'root',
=A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0password=3D'test',
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0database=3D'default') as conn:
=A0 =A0 with con= n.cursor() as cur:
cur.execute("CREATE TEMPORARY FUNCTION su= bstr AS 'com.nexr.platform.hive.udf.UDFSubstrForOracle'")
=A0 =A0=A0 #Execute query
=A0 =A0 =A0 =A0 cur.execute(&quo= t;select substr(description,2,4) from sample_07")

=A0 =A0 =A0 =A0 #Retur= n column info from query
=A0 =A0 =A0 =A0 print cur.getSchema()

=A0 =A0 =A0 =A0 #Fetch table results
=A0 =A0 =A0 =A0 for i in cur.fetch():
=A0 =A0 =A0 =A0 =A0 =A0 print i
=
Thanks,
Brad


On Mon, Apr 28, 2014 at 7:39 AM, David Engel <david@istwok.net> wrote:
Thanks for your response.

We've essentially done your first suggestion in the past by copying or<= br> symlinking our jar into Hive's lib directory. =A0It works, but we'd= like
a better way for different users to to use different versions of our
jar during development. =A0Perhaps that's not possible, though, without=
running completely differnt instances of Hive.

I don't think your second suggestion will work. =A0The original problem=
is that when "add jar file.jar" is run through pyhs2, the fulle command gets passed to AddResourceProcessor.run(), yet
AddResourceProcessor.run() is written such that it only expects "jar file.jar" to get passed to it. =A0That's how it appears to work wh= en
"add jar file.jar" is run from a stand-alone Hive CLI and from be= eline.

David

On Sat, Apr 26, 2014 at 12:14:53AM -0700, Brad Ruderman wrote:
> An easy solution would be to add the jar to the classpath or auxlibs > therefore every instance of hive already has the jar and you just need= to
> create the temporary function.
>
> Else you can put the JAR in HDFS and reference the add jar using the h= dfs
> scheme. Example:
>
> import pyhs2
>
> with pyhs2.connect(host=3D'127.0.0.1',
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0port=3D10000,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0authMechanism=3D"PLAIN&quo= t;,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0user=3D'root',
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0password=3D'test',
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0database=3D'default') a= s conn:
> =A0 =A0 with conn.cursor() as cur:
> cur.execute("ADD JAR hdfs://
> sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-S= NAPSHOT.jar")
> =A0cur.execute("CREATE TEMPORARY FUNCTION substr AS
> 'com.nexr.platform.hive.udf.UDFSubstrForOracle'")
> =A0 =A0 #Execute query
> =A0 =A0 =A0 =A0 cur.execute("select substr(description,2,4) from = sample_07")
>
> =A0 =A0 =A0 =A0 #Return column info from query
> =A0 =A0 =A0 =A0 print cur.getSchema()
>
> =A0 =A0 =A0 =A0 #Fetch table results
> =A0 =A0 =A0 =A0 for i in cur.fetch():
> =A0 =A0 =A0 =A0 =A0 =A0 print i
>
>
> On Fri, Apr 25, 2014 at 7:54 AM, David Engel <david@istwok.net> wrote:
>
> > Hi,
> >
> > I'm trying to convert some of our Hive queries to use the pyh= s2 Python
> > package (https://github.com/BradRuderman/pyhs2). =A0Because we have = our
> > own jar with some custom SerDes and UDFs, we need to use the &quo= t;add jar
> > /path/to/my.jar" command to make them available to Hive. =A0= This works
> > fine using the Hive CLI directly and also with the Beeline client= . =A0It
> > doesn't work, however, with pyhs2.
> >
> > I naively tracked the problem down to a bug in
> > AddResourceProcessor.run(). =A0See HIVE-6971 in Jira. =A0My attem= pted fix
> > turned out to not be correct because it breaks the "add"= ; command when
> > used from the CLI and Beeline. =A0It seems the "add" pa= rt of any "add
> > file|jar|archive ..." command needs to get stripped off some= where
> > before it gets passed to AddResourceProcessor.run(). =A0Unfortuna= tely, I
> > can't find that location when the command is received from py= hs2. =A0Can
> > someone help?
> >
> > David
> > --
> > David Engel
> > david@istwok.net
> >

--
David Engel
david@istwok.net

--047d7bacb40e3371d104f81d9295--