hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt McCline (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE
Date Tue, 20 Jun 2017 06:57:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16047406#comment-16047406
] 

Matt McCline edited comment on HIVE-16589 at 6/20/17 6:56 AM:
--------------------------------------------------------------

Jason spotted a difference in the query result for vectorization_short_regress.q.out -- that
is when vectorization is turned off and a base .q.out file created, there are 2 differences:

19th field of the query below: Vectorized 511 vs Non-Vectorized -58

{noformat}
SELECT MAX(cint),
       (MAX(cint) / -3728),
       (MAX(cint) * -3728),
       VAR_POP(cbigint),
       (-((MAX(cint) * -3728))),
       STDDEV_POP(csmallint),
       (-563 % (MAX(cint) * -3728)),
       (VAR_POP(cbigint) / STDDEV_POP(csmallint)),
       (-(STDDEV_POP(csmallint))),
       MAX(cdouble),
       AVG(ctinyint),
       (STDDEV_POP(csmallint) - 10.175),
       MIN(cint),
       ((MAX(cint) * -3728) % (STDDEV_POP(csmallint) - 10.175)),
       (-(MAX(cdouble))),
       MIN(cdouble),
       (MAX(cdouble) % -26.28),
       STDDEV_SAMP(csmallint),
       (-((MAX(cint) / -3728))),
       ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),
       ((MAX(cint) / -3728) - AVG(ctinyint)),
       (-((MAX(cint) * -3728))),
       VAR_SAMP(cint)
FROM   alltypesorc
WHERE  (((cbigint <= 197)
         AND (cint < cbigint))
        OR ((cdouble >= -26.28)
            AND (csmallint > cdouble))
        OR ((ctinyint > cfloat)
            AND (cstring1 RLIKE '.*ss.*'))
           OR ((cfloat > 79.553)
               AND (cstring2 LIKE '10%')))
{noformat}

Column expression is:  (-((MAX(cint) / -3728))),

-----------------------------------------------

This is a previously existing issue and now filed as  HIVE-16919: "Vectorization: vectorization_short_regress.q
has query result differences with non-vectorized run"
10th field of the query below: Non-Vectorized -6432.000015344526 vs. -Vectorized -6432.0

Column expression is (-(cdouble)) as c4,

{noformat}
SELECT   ctimestamp1,
         cstring2,
         cdouble,
         cfloat,
         cbigint,
         csmallint,
         (cbigint / 3569) as c1,
         (-257 - csmallint) as c2,
         (-6432 * cfloat) as c3,
         (-(cdouble)) as c4,
         (cdouble * 10.175) as c5,
         ((-6432 * cfloat) / cfloat) as c6,
         (-(cfloat)) as c7,
         (cint % csmallint) as c8,
         (-(cdouble)) as c9,
         (cdouble * (-(cdouble))) as c10
FROM     alltypesorc
WHERE    (((-1.389 >= cint)
           AND ((csmallint < ctinyint)
                AND (-6432 > csmallint)))
          OR ((cdouble >= cfloat)
              AND (cstring2 <= 'a'))
             OR ((cstring1 LIKE 'ss%')
                 AND (10.175 > cbigint)))
{noformat}


was (Author: mmccline):

Jason spotted a difference in the query result for vectorization_short_regress.q.out -- that
is when vectorization is turned off and a base .q.out file created, there are 2 differences:

19th field of the query below: Vectorized 511 vs Non-Vectorized -58

{noformat}
SELECT MAX(cint),
       (MAX(cint) / -3728),
       (MAX(cint) * -3728),
       VAR_POP(cbigint),
       (-((MAX(cint) * -3728))),
       STDDEV_POP(csmallint),
       (-563 % (MAX(cint) * -3728)),
       (VAR_POP(cbigint) / STDDEV_POP(csmallint)),
       (-(STDDEV_POP(csmallint))),
       MAX(cdouble),
       AVG(ctinyint),
       (STDDEV_POP(csmallint) - 10.175),
       MIN(cint),
       ((MAX(cint) * -3728) % (STDDEV_POP(csmallint) - 10.175)),
       (-(MAX(cdouble))),
       MIN(cdouble),
       (MAX(cdouble) % -26.28),
       STDDEV_SAMP(csmallint),
       (-((MAX(cint) / -3728))),
       ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),
       ((MAX(cint) / -3728) - AVG(ctinyint)),
       (-((MAX(cint) * -3728))),
       VAR_SAMP(cint)
FROM   alltypesorc
WHERE  (((cbigint <= 197)
         AND (cint < cbigint))
        OR ((cdouble >= -26.28)
            AND (csmallint > cdouble))
        OR ((ctinyint > cfloat)
            AND (cstring1 RLIKE '.*ss.*'))
           OR ((cfloat > 79.553)
               AND (cstring2 LIKE '10%')))
{noformat}

Column expression is:  (-((MAX(cint) / -3728))),

-----------------------------------------------

10th field of the query below: Vectorized -6432.000015344526 vs. Non-Vectorized -6432.0

Column expression is (-(cdouble)) as c4,

{noformat}
SELECT   ctimestamp1,
         cstring2,
         cdouble,
         cfloat,
         cbigint,
         csmallint,
         (cbigint / 3569) as c1,
         (-257 - csmallint) as c2,
         (-6432 * cfloat) as c3,
         (-(cdouble)) as c4,
         (cdouble * 10.175) as c5,
         ((-6432 * cfloat) / cfloat) as c6,
         (-(cfloat)) as c7,
         (cint % csmallint) as c8,
         (-(cdouble)) as c9,
         (cdouble * (-(cdouble))) as c10
FROM     alltypesorc
WHERE    (((-1.389 >= cint)
           AND ((csmallint < ctinyint)
                AND (-6432 > csmallint)))
          OR ((cdouble >= cfloat)
              AND (cstring2 <= 'a'))
             OR ((cstring1 LIKE 'ss%')
                 AND (10.175 > cbigint)))
{noformat}

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE
 for AVG, VARIANCE
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16589
>                 URL: https://issues.apache.org/jira/browse/HIVE-16589
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, HIVE-16589.03.patch, HIVE-16589.04.patch,
HIVE-16589.05.patch, HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, HIVE-16589.091.patch,
HIVE-16589.092.patch, HIVE-16589.093.patch, HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch,
HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, HIVE-16589.099.patch, HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for Complex Types
in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG GroupBy.  In
particular, the PARTIAL2 and FINAL groupby modes that take in the AVG struct as input.  And,
add the COMPLETE mode that takes in the Original data and produces the Full Aggregation for
completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message