hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fab wol <darkwoll...@gmail.com>
Subject Best way to avoid cross join
Date Wed, 05 Mar 2014 13:17:24 GMT
Hey everyone,

before i write a lot of text, i just post something which is already
written:
http://www.sqlservercentral.com/Forums/Topic1328496-360-1.aspx

The first posts adresses a pretty similar problem i also have. Currently my
implementation looks like this:

SELECT id1,
  MAX(
  CASE
    WHEN m.keyword IS NULL
    THEN 0
    WHEN instr(m.keyword, prep_kw.keyword) > 0
    THEN 1
    ELSE 0
  END) AS flag
FROM (select id1, keyword from import1) m
CROSS JOIN
  (SELECT keyword FROM et_keywords) prep_kw
GROUP BY id1;

Since there is a cross join involved, the execution gets pinned down to 1
reducer only and it takes ages to complete.

The thread i posted is solving this with some special SQLserver tactics.
But I was wondering if anybody has encountered the problem in Hive already
and found a better way to solve this.

I'm using Hive 0.11 on a MapR Distribution, if this is somehow important.

Cheers
Wolli

Mime
View raw message