hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bennie Leo <tben...@hotmail.com>
Subject RE: Limiting outer join
Date Tue, 07 Jul 2015 20:55:53 GMT
Never mind, I got it working with UDF. I just pass the file location to my evaluate function.
Thanks! :)
 
From: tbenleo@hotmail.com
To: user@hive.apache.org
Subject: RE: Limiting outer join
Date: Tue, 7 Jul 2015 09:59:22 -0700




Thanks for your replies.
 
I see how extracting the first country would work, however I was hoping to speed up my query
by stopping the search once a country has been found.
 
Are you suggesting that I pass the whole IP table to a UDF and perform the search myself?
I've only programmed simple UDFs so far (ex: reformat a string), so any additional details
would be appreciated. I am mostly concerned about importing said table (currently stored in
Hive) and distributing the task across nodes (note that I use Tez).
 
Regards,
B
 
> Date: Mon, 6 Jul 2015 18:18:44 -0700
> Subject: Re: Limiting outer join
> From: gopalv@apache.org
> To: user@hive.apache.org
> 
> 
> > In the following query, it is possible to limit the amount of entries
> >returned by an outer join to a single value? I want to obtain a single
> >country from ipv4geotable for each entry in logontable.
> 
> Yes, the PTF DENSE_RANK()/ROW_NUMBER() basically gives you that - you can
> read the first row out of each logon.IP except, there¹s no way to force
> which country wins over the other without an order by country in the
> OVER() clause as well.
> 
> That said, it will only get slower to produce 1 row per group, because of
> the distributed nature of the SQL engine, the reduction of data happens
> after a ordering shuffle.
> 
> You¹re doing range joins in a SQL engine without theta joins and MapReduce
> had no way to implement those at runtime (Tez has, with EdgeManager
> plugins).
> 
> The easiest/traditional approach out of doing geo-IP lookups is a compact
> UDF model without any joins at all.
> 
> There¹s some old threads on discussing this as a built-in & some code
> (with potential licensing issues) -
> http://markmail.org/message/w54j4upwg2wbh3xg
> 
> Cheers,
> Gopal
> 
> 
 		 	   		   		 	   		  
Mime
View raw message