hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Weiler <Kevin.Wei...@imc-chicago.com>
Subject python UDF and Avro tables
Date Thu, 24 Jul 2014 15:52:03 GMT
Hi All,

I hope I’m not duplicating a previous question, but I couldn’t find any search functionality
for the user list archives.

I have written a relatively simple python script that is meant to take a field from a hive
query and transform it (just some string processing through a dict) given that certain conditions
are met. After reading this guide:

http://blog.spryinc.com/2013/09/a-guide-to-user-defined-functions-in.html

it would appear that the python script needs to read from STDIN the native file format (in
my case Avro) and write to STDOUT. I implemented this functionality using the python fastavro
deserializer and cStringIO for the STDIN/STDOUT bit. I then placed the appropriate python
modules on all the nodes (which I could probably do a bit better by simply storing in HDFS).
Unfortunately, I’m still getting errors while trying to transform my field which are appended
below. I believe the problem is that HDFS can end up splitting the files at arbitrary points
and you could have an Avro file with no schema appended to the top. Has anyone had any luck
running a python UDF on an Avro table? Cheers!


Traceback (most recent call last):
  File "coltoskip.py", line 33, in <module>
    reader = avro.reader(avrofile)
  File "_reader.py", line 368, in fastavro._reader.iter_avro.__init__ (fastavro/_reader.c:6438)
ValueError: cannot read header - is it an avro file?
org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error occurred when trying
to close the Operator running your custom script.
        at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:514)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)
org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error occurred when trying
to close the Operator running your custom script.
        at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:514)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)
org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error occurred when trying
to close the Operator running your custom script.
        at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:514)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)

--
Kevin Weiler
IT

IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | http://imc-chicago.com/
Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: kevin.weiler@imc-chicago.com<mailto:Kevin.Weiler@imc-chicago.com>


________________________________

The information in this e-mail is intended only for the person or entity to which it is addressed.

It may contain confidential and /or privileged material. If someone other than the intended
recipient should receive this e-mail, he / she shall not be entitled to read, disseminate,
disclose or duplicate it.

If you receive this e-mail unintentionally, please inform us immediately by "reply" and then
delete it from your system. Although this information has been compiled with great care, neither
IMC Financial Markets & Asset Management nor any of its related entities shall accept
any responsibility for any errors, omissions or other inaccuracies in this information or
for the consequences thereof, nor shall it be bound in any way by the contents of this e-mail
or its attachments. In the event of incomplete or incorrect transmission, please return the
e-mail to the sender and permanently delete this message and any attachments.

Messages and attachments are scanned for all known viruses. Always scan attachments before
opening them.

Mime
View raw message