hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Hive Query with UDF
Date Thu, 18 Oct 2012 02:21:28 GMT
You really don't want to do that.
It becomes a nightmare in that you now ship a derivative of Hive and then have to maintain
it and keep it in lock step w Hive from Apache. 


There are other options and designs but since this is for a commercial product. I'm not going
to talk about them. 

Keep in mind that Hive isn't a relational database per se and works on immutable flat files.
So that's going to hurt you as well. 

On Oct 17, 2012, at 9:13 PM, lohit <lohit.vijayarenu@gmail.com> wrote:

> One idea is to write your own translation layer  which sits in between query and actual
job submission. 
> You would most likely end up having your own version of hive jar which has your translation
changes on top of HIVE sources.
> This has the added advantage that users need not change their queries, they would do
it as normal HIVE query, like
>    select * from cc_details where first_name = 'Ann'
> Disadvantage is you have to maintain a fork. 
> 
> Even otherwise, my initial guess is you might have to modify command line parser which
does encrypt once instead of for every record
> 
> 2012/10/17 Sam Mohamed <sam.mohamed@voltage.com>
> Thanks for the quick response.
> 
> The idea is that we are selling the encryption product for customers who use HDFS.  Hence,
encryption is a requirement.
> 
> Any other suggestions.
> 
> Sam
> ________________________________________
> From: Michael Segel [michael_segel@hotmail.com]
> Sent: Wednesday, October 17, 2012 6:10 PM
> To: user@hadoop.apache.org
> Subject: Re: Hive Query with UDF
> 
> You don't need an UDF...
> 
> You encrypt the string 'Ann' first then use that encrypted value in the Select statement.
> 
> That should make things a bit simpler.
> 
> 
> 
> On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sam.mohamed@voltage.com> wrote:
> 
> > I have some encrypted data in an HDFS csv, that I've created a Hive table for, and
I want to run a Hive query that first encrypts the query param, then does the lookup.  I have
a UDF that does encryption as follows:
> >
> > public class ParamEncrypt extends UDF {
> >
> >  public Text evaluate(String name) throws Exception {
> >
> >      String result = new String();
> >
> >      if (name == null) { return null; }
> >
> >      result = ParamData.encrypt(name);
> >
> >      return new Text(result);
> >  }
> > }
> >
> > Then I run the Hive query as:
> >
> >  select * from cc_details where first_name = encrypt('Ann');
> >
> > The problem is, it's running encrypt('Ann') across every single record in the table.
 I want it do the encryption once, then do the matchup.  I've tried:
> >
> >  select * from cc_details where first_name in (select encrypt('Ann') from cc_details
limit 1);
> >
> > But Hive doesn't support **IN** or select queries in the where clause.
> >
> > What can I do?
> >
> > Can I do something like:
> >
> >  select encrypt('Ann') as ann from cc_details where first_name = ann;
> >
> > That also doesn't work because the query parser throws an error saying **ann** is
not a known column
> >
> > Thanks,
> >
> > Sam
> 
> 
> 
> 
> -- 
> Have a Nice Day!
> Lohit


Mime
View raw message