hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yogesh Keshetty <yogesh.keshe...@outlook.com>
Subject RE: Hive Generic UDF invoking Hbase
Date Wed, 30 Sep 2015 05:01:42 GMT
Thanks for the reply Douglas.
We haven't set up Tez yet. The default execution mode is MR.I checked the log files. There
is no indication of map reduce logs for the program. There is no map reduce program generated.
I can't see the job with "hadoop job -list" command too. 
When we were testing the generic UDF's in hive 0.13 for every insertion into Hbase hive would
trigger MR program. When we migrated to hive 0.14/ 1.0 it wouldn't generate any MR for the
same activity. I don't know what has changed internally. Anyone who has tried to call HBase
tables from Hive UDFs? Please help us.
Thanks in advance!

From: Douglas.Moore@thinkbiganalytics.com
To: user@hive.apache.org
Subject: Re: Hive Generic UDF invoking Hbase
Date: Wed, 30 Sep 2015 03:24:53 +0000






I'm guessing you might now be using tez now where you were using MR before.
You can tell hive to run in map reduce mode, by setting the hive execution mode, from within
the hive script.



See this page for details
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties



To answer your question though, you can look at the yarn job logs https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#logs
For jobs that have stopped running 
or the scheduler page on the resource manager. The scheduler page will show running jobs and
how many containers they are using. 



I'm not familiar with the MapR management UIs, they should have a UI to show running jobs
and you can drill down to see tasks/containers.



Hope this helps


Sent from my iPhone


On Sep 29, 2015, at 9:39 PM, Yogesh Keshetty <yogesh.keshetty@outlook.com> wrote:







Hi,
 
I have a quick question about Hive Generic UDF’s. We are trying to do some CRUD operations
on HBase tables from hive generic UDF. But, the issue here is until hive 0.13, it would  generate
map reduce task where we could track the status
 of execution. Once we migrated to hive 1.0, it doesn’t show any status, it is probably
doing a streaming on the data. How can we know if it is using multiple mappers for the job?
 
I thought this process would be pretty fast in terms of performance. But, looks like it is
taking way longer than what we estimated. For 11.2 million records it has been more than 8
hours still it is in progress.
 
Use Case:
 
Let us say my table name is “MemberTable”. The generic UDF name is “Membership” which
accepts
n columns as the parameters to the UDF. Inside the UDF we wrote some internal algorithm and
insert the values in to multiple hbase tables.
 
Sample Query:
 
CREATE TEMPORARY FUNCTION membership as ‘com.fishbowl.udf.membership’
 
SELECT membership(c1,c2,c3,c4,c5,c5,c7) from MemberTable;
 
 
Cluster info:
4 Node cluster (each 32 GB)
Hive version: 1.0
Hbase Version: 0.98.12
Distro: Mapr 
 



Thanks in advance!



PS: This is really urgent, I hope someone can help us asap.



Thank you,
Yogesh
 


 		 	   		  
Mime
View raw message