hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-12491) Statistics: 3 attribute join on a 2-source table is off
Date Sun, 22 Nov 2015 06:53:10 GMT
Gopal V created HIVE-12491:
------------------------------

             Summary: Statistics: 3 attribute join on a 2-source table is off
                 Key: HIVE-12491
                 URL: https://issues.apache.org/jira/browse/HIVE-12491
             Project: Hive
          Issue Type: Bug
            Reporter: Gopal V
            Assignee: Prasanth Jayachandran


The eased out denominator has to detect duplicate row-stats from different attributes.

{code}
  private Long getEasedOutDenominator(List<Long> distinctVals) {
      // Exponential back-off for NDVs.
      // 1) Descending order sort of NDVs
      // 2) denominator = NDV1 * (NDV2 ^ (1/2)) * (NDV3 ^ (1/4))) * ....
      Collections.sort(distinctVals, Collections.reverseOrder());

      long denom = distinctVals.get(0);
      for (int i = 1; i < distinctVals.size(); i++) {
        denom = (long) (denom * Math.pow(distinctVals.get(i), 1.0 / (1 << i)));
      }

      return denom;
    }
{code}

This gets {{[8007986, 821974390, 821974390]}}, which is actually 3 columns 2 of which are
from the RHS table.

So the eased out denominator is off by a factor of 30,000 or so, causing OOMs in map-joins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message