Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EE1C01175A for ; Thu, 24 Jul 2014 15:52:30 +0000 (UTC) Received: (qmail 65871 invoked by uid 500); 24 Jul 2014 15:52:29 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 65791 invoked by uid 500); 24 Jul 2014 15:52:28 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 65781 invoked by uid 99); 24 Jul 2014 15:52:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 15:52:28 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Kevin.Weiler@imc-chicago.com designates 199.168.44.1 as permitted sender) Received: from [199.168.44.1] (HELO enyo.imc-chicago.com) (199.168.44.1) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 15:52:24 +0000 Received: from chimailtest.trading.imc.intra (HELO MAILTRADING03.trading.imc.intra) ([10.198.0.40]) by enyo.trading.imc.intra with ESMTP/TLS/AES128-SHA; 24 Jul 2014 10:52:04 -0500 Received: from MAILTRADING04.trading.imc.intra ([fe80::a116:5720:1f75:68e9]) by MAILTRADING03.trading.imc.intra ([fe80::fc31:9df9:b383:154%15]) with mapi id 14.03.0174.001; Thu, 24 Jul 2014 10:52:03 -0500 From: Kevin Weiler To: "user@hive.apache.org" Subject: python UDF and Avro tables Thread-Topic: python UDF and Avro tables Thread-Index: AQHPp1c2gAxt6coF4kelmZg+5mdpyA== Date: Thu, 24 Jul 2014 15:52:03 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.198.240.15] Content-Type: multipart/alternative; boundary="_000_A7BAFD58CAE54F609DE0BBFE5D3A08B0imcchicagocom_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_A7BAFD58CAE54F609DE0BBFE5D3A08B0imcchicagocom_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hi All, I hope I=92m not duplicating a previous question, but I couldn=92t find any= search functionality for the user list archives. I have written a relatively simple python script that is meant to take a fi= eld from a hive query and transform it (just some string processing through= a dict) given that certain conditions are met. After reading this guide: http://blog.spryinc.com/2013/09/a-guide-to-user-defined-functions-in.html it would appear that the python script needs to read from STDIN the native = file format (in my case Avro) and write to STDOUT. I implemented this funct= ionality using the python fastavro deserializer and cStringIO for the STDIN= /STDOUT bit. I then placed the appropriate python modules on all the nodes = (which I could probably do a bit better by simply storing in HDFS). Unfortu= nately, I=92m still getting errors while trying to transform my field which= are appended below. I believe the problem is that HDFS can end up splittin= g the files at arbitrary points and you could have an Avro file with no sch= ema appended to the top. Has anyone had any luck running a python UDF on an= Avro table? Cheers! Traceback (most recent call last): File "coltoskip.py", line 33, in reader =3D avro.reader(avrofile) File "_reader.py", line 368, in fastavro._reader.iter_avro.__init__ (fast= avro/_reader.c:6438) ValueError: cannot read header - is it an avro file? org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error o= ccurred when trying to close the Operator running your custom script. at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperat= or.java:514) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.ja= va:207) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupIn= formation.java:1548) at org.apache.hadoop.mapred.Child.main(Child.java:262) org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error o= ccurred when trying to close the Operator running your custom script. at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperat= or.java:514) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.ja= va:207) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupIn= formation.java:1548) at org.apache.hadoop.mapred.Child.main(Child.java:262) org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error o= ccurred when trying to close the Operator running your custom script. at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperat= or.java:514) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.ja= va:207) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupIn= formation.java:1548) at org.apache.hadoop.mapred.Child.main(Child.java:262) -- Kevin Weiler IT IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606= | http://imc-chicago.com/ Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: kevin.weiler@imc-ch= icago.com ________________________________ The information in this e-mail is intended only for the person or entity to= which it is addressed. It may contain confidential and /or privileged material. If someone other t= han the intended recipient should receive this e-mail, he / she shall not b= e entitled to read, disseminate, disclose or duplicate it. If you receive this e-mail unintentionally, please inform us immediately by= "reply" and then delete it from your system. Although this information has= been compiled with great care, neither IMC Financial Markets & Asset Manag= ement nor any of its related entities shall accept any responsibility for a= ny errors, omissions or other inaccuracies in this information or for the c= onsequences thereof, nor shall it be bound in any way by the contents of th= is e-mail or its attachments. In the event of incomplete or incorrect trans= mission, please return the e-mail to the sender and permanently delete this= message and any attachments. Messages and attachments are scanned for all known viruses. Always scan att= achments before opening them. --_000_A7BAFD58CAE54F609DE0BBFE5D3A08B0imcchicagocom_ Content-Type: text/html; charset="Windows-1252" Content-ID: <78112F647085354F8676468E4DCFCE21@imc.nl> Content-Transfer-Encoding: quoted-printable Hi All,

I hope I=92m not duplicating a previous question, but I couldn=92t fin= d any search functionality for the user list archives.

I have written a relatively simple python script that is meant to take= a field from a hive query and transform it (just some string processing th= rough a dict) given that certain conditions are met. After reading this gui= de:


it would appear that the python script needs to read from STDIN the na= tive file format (in my case Avro) and write to STDOUT. I implemented this = functionality using the python fastavro deserializer and cStringIO for the = STDIN/STDOUT bit. I then placed the appropriate python modules on all the nodes (which I could probably do= a bit better by simply storing in HDFS). Unfortunately, I=92m still gettin= g errors while trying to transform my field which are appended below. I bel= ieve the problem is that HDFS can end up splitting the files at arbitrary points and you could have an Avro = file with no schema appended to the top. Has anyone had any luck running a = python UDF on an Avro table? Cheers!

Traceback (most recent call last):
  File "coltoskip.py", line 33, in <module>
    reader =3D avro.reader(avrofile)
  File "_reader.py", line 368, in fastavro._reader.iter_avro.__in=
it__ (fastavro/_reader.c:6438)
ValueError: cannot read header - is it an avro file?
org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error o=
ccurred when trying to close the Operator running your custom script.
	at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java=
:514)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati=
on.java:1548)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error o=
ccurred when trying to close the Operator running your custom script.
	at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java=
:514)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati=
on.java:1548)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error o=
ccurred when trying to close the Operator running your custom script.
	at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java=
:514)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati=
on.java:1548)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
--
Kevin Weiler<= /div>

IT

 

IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Ch= icago, IL 60606 | http://imc-chicago.com/
Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Ma= il: kevin.weiler@imc-chicago.com=




The information in this e-mail is intended only for the person or entity to= which it is addressed.

It may contain confidential and /or privileged material. If someone other t= han the intended recipient should receive this e-mail, he / she shall not b= e entitled to read, disseminate, disclose or duplicate it.

If you receive this e-mail unintentionally, please inform us immediately by= "reply" and then delete it from your system. Although this infor= mation has been compiled with great care, neither IMC Financial Markets &am= p; Asset Management nor any of its related entities shall accept any responsibility for any errors, omissions or other inaccur= acies in this information or for the consequences thereof, nor shall it be = bound in any way by the contents of this e-mail or its attachments. In the = event of incomplete or incorrect transmission, please return the e-mail to the sender and permanently delet= e this message and any attachments.

Messages and attachments are scanned for all known viruses. Always scan att= achments before opening them.
--_000_A7BAFD58CAE54F609DE0BBFE5D3A08B0imcchicagocom_--