hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Pullokkaran <jpullokka...@hortonworks.com>
Subject Re: Hive indexing optimization
Date Fri, 26 Jun 2015 01:27:21 GMT
Set hive.optimize.index.filter=true;

Thanks
John

From: Bennie Leo <tbenleo@hotmail.com<mailto:tbenleo@hotmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Thursday, June 25, 2015 at 5:48 PM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Hive indexing optimization


Hi,

I am attempting to optimize a query using indexing. My current query converts an ipv4 address
to a country using a geolocation table. However, the geolocation table is fairly large and
the query takes an impractical amount of time. I have created indexes and set the binary search
parameter to true (default), but the query is not faster.

Here is how I set up indexing:

DROPINDEXIFEXISTS ipv4indexes ON ipv4geotable;
CREATEINDEX ipv4indexes
ONTABLE ipv4geotable (StartIp, EndIp)
AS'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITHDEFERREDREBUILD
IDXPROPERTIES ('hive.index.compact.binary.search'='true');

ALTERINDEX ipv4indexes ON ipv4geotable REBUILD;

And here is my query:

DROPTABLEIFEXISTS ipv4table;
CREATETABLE ipv4table AS
SELECT logon.IP, ipv4.Country
FROM
(SELECT * FROM logontable WHERE isIpv4(IP)) logon
LEFTOUTERJOIN
(SELECT StartIp, EndIp, Country FROM ipv4geotable) ipv4 ON isIpv4(logon.IP)
WHERE ipv4.StartIp <= logon.IPANDlogon.IP <= ipv4.EndIp;

What the query is doing is extracting an IP from logontable and finding in which range it
lies within the geolocation table (which is sorted). When a range is found, the corresponding
country is returned. The problem is that Hive goes through the whole table row by row rather
than performing a smart search (ex: binary search).

Any suggestions on how to speed things up?

Thank you,
B

Mime
View raw message