hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Pullokkaran <jpullokka...@hortonworks.com>
Subject Re: Hive indexing optimization
Date Sat, 27 Jun 2015 16:02:08 GMT
"SELECT StartIp, EndIp, Country FROM ipv4geotable” should have been
rewritten as a scan against index table.

BitMap Indexes seems to support inequalities (<=, <, >=).

Post the explain plan.

On 6/26/15, 8:56 PM, "Gopal Vijayaraghavan" <gopalv@apache.org> wrote:

>Hi,
>
>Hive indexes won¹t really help you speed up that query right now, because
>of the plan it generates due to the <= clauses.
>
>> CREATETABLE ipv4table
>> AS
>> SELECT logon.IP, ipv4.Country
>> FROM
>> (SELECT * FROM logontable WHERE isIpv4(IP)) logon
>> LEFT OUTER JOIN
>> (SELECT StartIp, EndIp, Country FROM ipv4geotable) ipv4 ON
>> isIpv4(logon.IP)
>> WHERE ipv4.StartIp <=logon.IP AND logon.IP <= ipv4.EndIp;
>
>That¹s a cross-product join, which can¹t be distributed at all & will take
>forever, even if you use Tez/LLAP.
>
>Range join queries have partial distribution rewrites, but AFAIK none of
>them use filter indexes in hive.
>
>But before I suggest a rewrite, can you post the ³explain <query>;² for
>the above query, so that I can check if this is indeed producing a
>cross-product + filter?
>
>Cheers,
>Gopal
>
>
>
>
>
>
>
>

Mime
View raw message