hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yogesh Keshetty <yogesh.keshe...@outlook.com>
Subject RE: Hive Generic UDF invoking Hbase
Date Wed, 30 Sep 2015 22:52:34 GMT

Ryan -  Yes, I have written UDFs and Generic UDFs before but this is the first time I wrote
a UDF that calls Hbase tables.
Jason - Yes in my Generic UDF I am using org.apache.hadoop.hbase.client.HTable​ , On the
hive side we should set some auxiliary jars property to add Hbase related jars. The respective
HBase jars are already set in the classpath.
Subject: Re: Hive Generic UDF invoking Hbase
From: jdere@hortonworks.com
To: user@hive.apache.org
Date: Wed, 30 Sep 2015 22:43:17 +0000







So your custom UDF is using org.apache.hadoop.hbase.client.HTable​? 
How do you resolve your UDF JAR (and this class) on the Hive client - are you doing ADD JAR,
or are your UDF JARs and HBase JARs in your Hive class path?


​




From: Ryan Harris <Ryan.Harris@zionsbancorp.com>

Sent: Wednesday, September 30, 2015 3:19 PM

To: user@hive.apache.org

Subject: RE: Hive Generic UDF invoking Hbase
 



without seeing the code I really can't help.
Have you written other functioning UDFs? Are you aware of the requirements? https://cwiki.apache.org/confluence/display/Hive/HivePlugins
 
 


From: Yogesh Keshetty [mailto:yogesh.keshetty@outlook.com]


Sent: Wednesday, September 30, 2015 3:19 PM

To: user@hive.apache.org; user@hive.apache.org

Subject: RE: Hive Generic UDF invoking Hbase


 

I believe  It's not because of classpath. For a single task / for streaming it's working fine
right. 

Sent from Outlook

 







On Wed, Sep 30, 2015 at 1:58 PM -0700, "Ryan Harris" <Ryan.Harris@zionsbancorp.com>
wrote:



are all tasks failing with the same error message?
 
based on this:

Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.HTable
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 
I'd guess that there may be some classpath issue on your datanodes?
I don't have as much experience troubleshooting custom UDFs, hopefully someone else will have
better insights for you there.
 
 


From: Yogesh Keshetty [mailto:yogesh.keshetty@outlook.com]


Sent: Wednesday, September 30, 2015 2:48 PM

To: Hive community

Subject: RE: Hive Generic UDF invoking Hbase


 

Jason and Ryan,

 


Thanks for the solutions. It's now launching in MapReduce Mode. However, we are encountering
another issue, since UDF is executing parallely now, We are facing another issue. Inside the
 generic UDF we are processing the records and storing in Hbase record by record. The job
is getting killed, I am assuming since all the tasks are trying to access the same Hbase Table
parallely this is not happening? this was working with just streaming. Is
 there any setting that should be enabled? 


 


Please find the stacktrace for the same. 
Error during job, obtaining debugging information...
Examining task ID: task_1443279785342_0017_m_000000 (and more) from job job_1443279785342_0017
 
Task with the most failures(4):
-----
Task ID:
  task_1443279785342_0017_m_000001
 
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hive.com.esotericsoftware.kryo.KryoException:
java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer"
for class:
 com.ga.fishbowl.CustomerMatchingPayment_Test
Serialization trace:
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
        at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:423)
        at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:286)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:263)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:478)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:471)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:648)
        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:172)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:414)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException:
Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer"
for class: com.ga.fishbowl.CustomerMatchingPayment_Test
Serialization trace:
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
        at org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1025)
        at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:933)
        at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:947)
        at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:390)
        ... 13 more
Caused by: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer"
for class: com.ga.fishbowl.CustomerMatchingPayment_Test
        at org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:45)
        at org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:26)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.newDefaultSerializer(Kryo.java:343)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.getDefaultSerializer(Kryo.java:336)
        at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.registerImplicit(DefaultClassResolver.java:56)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:476)
        at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:148)
        at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
        ... 43 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hive.com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:32)
        ... 52 more
Caused by: java.lang.NoClassDefFoundError: Lorg/apache/hadoop/hbase/client/HTable;
        at java.lang.Class.getDeclaredFields0(Native Method)
        at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
        at java.lang.Class.getDeclaredFields(Class.java:1811)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.rebuildCachedFields(FieldSerializer.java:150)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.<init>(FieldSerializer.java:109)
        ... 56 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.HTable
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 61 more
 
 
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
 




Subject: Re: Hive Generic UDF invoking Hbase

From: jdere@hortonworks.com

To: user@hive.apache.org

Date: Wed, 30 Sep 2015 17:19:18 +0000
Take a look at hive.fetch.task.conversion in https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties​,
 try setting to "none" or "minimal"
 





From: Ryan Harris <Ryan.Harris@zionsbancorp.com>

Sent: Wednesday, September 30, 2015 9:19 AM

To: user@hive.apache.org

Subject: RE: Hive Generic UDF invoking Hbase


 




This may be a bit of 'hack'  but I've found that basic select-only operations will often cause
Hive to stream data without running the job through an actual
 MR phase.  That would typically be a logical approach for a "give me everything" query if
it were not for the UDF...
 
try adding a basic where clause to the query and see if that changes the behavior... e.g.
SELECT membership(c1,c2,c3,c4,c5,c5,c7) from MemberTable where c1 is not NULL;
 
 


From: Yogesh Keshetty [mailto:yogesh.keshetty@outlook.com]


Sent: Tuesday, September 29, 2015 11:02 PM

To: Hive community

Subject: RE: Hive Generic UDF invoking Hbase


 

Thanks for the reply Douglas.

 


We haven't set up Tez yet. The default execution mode is MR.


I checked the log files. There is no indication of map reduce logs for the program. There
is no map reduce program generated. I can't see the job with "hadoop job -list" command
 too. 


 


When we were testing the generic UDF's in hive 0.13 for every insertion into Hbase hive would
trigger MR program. When we migrated to hive 0.14/ 1.0 it wouldn't generate any
 MR for the same activity. I don't know what has changed internally. Anyone who has tried
to call HBase tables from Hive UDFs? Please help us.


 


Thanks in advance!


 




From: Douglas.Moore@thinkbiganalytics.com

To: user@hive.apache.org

Subject: Re: Hive Generic UDF invoking Hbase

Date: Wed, 30 Sep 2015 03:24:53 +0000

I'm guessing you might now be using tez now where you were using MR before.


You can tell hive to run in map reduce mode, by setting the hive execution mode, from within
the hive script.


 


See this page for details


https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties


 


To answer your question though, you can look at the yarn job logs https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#logs


For jobs that have stopped running 


or the scheduler page on the resource manager. The scheduler page will show running jobs and
how many containers they are using. 


 


I'm not familiar with the MapR management UIs, they should have a UI to show running jobs
and you can drill down to see tasks/containers.


 


Hope this helps




Sent from my iPhone




On Sep 29, 2015, at 9:39 PM, Yogesh Keshetty <yogesh.keshetty@outlook.com> wrote:




Hi,
 
I have a quick question about Hive Generic UDF’s. We are trying to do some CRUD operations
on HBase tables from hive generic UDF. But, the issue here is until hive 0.13, it
 would  generate map reduce task where we could track the status of execution. Once we migrated
to hive 1.0, it doesn’t show any status, it is probably doing a streaming on the data. How
can we know if it is using multiple mappers for the job?
 
I thought this process would be pretty fast in terms of performance. But, looks like it is
taking way longer than what we estimated. For 11.2 million records it has been more
 than 8 hours still it is in progress.
 
Use Case:
 
Let us say my table name is “MemberTable”. The generic UDF name is “Membership” which
accepts
n columns as the parameters to the UDF. Inside the UDF we wrote some internal algorithm and
insert the values in to multiple hbase tables.
 
Sample Query:
 
CREATE TEMPORARY FUNCTION membership as ‘com.fishbowl.udf.membership’
 
SELECT membership(c1,c2,c3,c4,c5,c5,c7) from MemberTable;
 
 
Cluster info:
4 Node cluster (each 32 GB)
Hive version: 1.0
Hbase Version: 0.98.12
Distro: Mapr

 
 
Thanks in advance!
 
PS: This is really urgent, I hope someone can help us asap.
 
Thank you,
Yogesh
 










THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain
information that is privileged and exempt from disclosure under applicable law.
 If you are neither the intended recipient nor responsible for delivering the message to the
intended recipient, please note that any dissemination, distribution, copying or the taking
of any action in reliance upon the message is strictly prohibited. If you
 have received this communication in error, please notify the sender immediately. Thank you.









THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain
information that is privileged and exempt from disclosure under applicable law. If you are
neither the intended recipient nor responsible for
 delivering the message to the intended recipient, please note that any dissemination, distribution,
copying or the taking of any action in reliance upon the message is strictly prohibited. If
you have received this communication in error, please notify the
 sender immediately. Thank you.



THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain
information that is privileged and exempt from disclosure under applicable law. If you are
neither the intended recipient nor responsible for delivering the message
 to the intended recipient, please note that any dissemination, distribution, copying or the
taking of any action in reliance upon the message is strictly prohibited. If you have received
this communication in error, please notify the sender immediately. Thank
 you.


 		 	   		  
Mime
View raw message