hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt McCline (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-16919) Vectorization: vectorization_short_regress.q has query result differences with non-vectorized run. Vectorized unary function broken?
Date Wed, 21 Jun 2017 04:36:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matt McCline updated HIVE-16919:
--------------------------------
    Description: 
Jason spotted a difference in the query result for vectorization_short_regress.q.out -- that
is when vectorization is turned off and a base .q.out file created, there are 2 differences.

They both seem to be related to negation.  For example, in the first one MAX(cint) and MAX(cint)
appear earlier as columns and match non-vec and vec.  So, it doesn't appear that aggregation
is failing.  It seems like the issue is now that the Reducer is vectorizing, a bug is exposed.
 So, even though MAX and MIN are the same, the expression with negation returns different
results.

19th field of the query below: Vectorized 511 vs Non-Vectorized -58

{noformat}
SELECT MAX(cint),
       (MAX(cint) / -3728),
       (MAX(cint) * -3728),
       VAR_POP(cbigint),
       (-((MAX(cint) * -3728))),
       STDDEV_POP(csmallint),
       (-563 % (MAX(cint) * -3728)),
       (VAR_POP(cbigint) / STDDEV_POP(csmallint)),
       (-(STDDEV_POP(csmallint))),
       MAX(cdouble),
       AVG(ctinyint),
       (STDDEV_POP(csmallint) - 10.175),
       MIN(cint),
       ((MAX(cint) * -3728) % (STDDEV_POP(csmallint) - 10.175)),
       (-(MAX(cdouble))),
       MIN(cdouble),
       (MAX(cdouble) % -26.28),
       STDDEV_SAMP(csmallint),
       (-((MAX(cint) / -3728))),
       ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),
       ((MAX(cint) / -3728) - AVG(ctinyint)),
       (-((MAX(cint) * -3728))),
       VAR_SAMP(cint)
FROM   alltypesorc
WHERE  (((cbigint <= 197)
         AND (cint < cbigint))
        OR ((cdouble >= -26.28)
            AND (csmallint > cdouble))
        OR ((ctinyint > cfloat)
            AND (cstring1 RLIKE '.*ss.*'))
           OR ((cfloat > 79.553)
               AND (cstring2 LIKE '10%')))
{noformat}

Column expression is:  ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),

-----------------------------------------------

This is a previously existing issue and now filed as  HIVE-16919: "Vectorization: vectorization_short_regress.q
has query result differences with non-vectorized run"
10th field of the query below: Non-Vectorized -6432.000015344526 vs. -Vectorized -6432.0

Column expression is (-(cdouble)) as c4,

Query result for vectorization_short_regress.q.out -- that is when vectorization is turned
off and a base .q.out file created.

-----------------------------------------------

10th field of the query below: Non-Vectorized -6432.000015344526 vs. Vectorized -6432.0

Column expression is (-(cdouble)) as c4,

{noformat}
SELECT   ctimestamp1,
         cstring2,
         cdouble,
         cfloat,
         cbigint,
         csmallint,
         (cbigint / 3569) as c1,
         (-257 - csmallint) as c2,
         (-6432 * cfloat) as c3,
         (-(cdouble)) as c4,
         (cdouble * 10.175) as c5,
         ((-6432 * cfloat) / cfloat) as c6,
         (-(cfloat)) as c7,
         (cint % csmallint) as c8,
         (-(cdouble)) as c9,
         (cdouble * (-(cdouble))) as c10
FROM     alltypesorc
WHERE    (((-1.389 >= cint)
           AND ((csmallint < ctinyint)
                AND (-6432 > csmallint)))
          OR ((cdouble >= cfloat)
              AND (cstring2 <= 'a'))
             OR ((cstring1 LIKE 'ss%')
                 AND (10.175 > cbigint)))
{noformat}

  was:
Query result for vectorization_short_regress.q.out -- that is when vectorization is turned
off and a base .q.out file created.

-----------------------------------------------

10th field of the query below: Non-Vectorized -6432.000015344526 vs. Vectorized -6432.0

Column expression is (-(cdouble)) as c4,

{noformat}
SELECT   ctimestamp1,
         cstring2,
         cdouble,
         cfloat,
         cbigint,
         csmallint,
         (cbigint / 3569) as c1,
         (-257 - csmallint) as c2,
         (-6432 * cfloat) as c3,
         (-(cdouble)) as c4,
         (cdouble * 10.175) as c5,
         ((-6432 * cfloat) / cfloat) as c6,
         (-(cfloat)) as c7,
         (cint % csmallint) as c8,
         (-(cdouble)) as c9,
         (cdouble * (-(cdouble))) as c10
FROM     alltypesorc
WHERE    (((-1.389 >= cint)
           AND ((csmallint < ctinyint)
                AND (-6432 > csmallint)))
          OR ((cdouble >= cfloat)
              AND (cstring2 <= 'a'))
             OR ((cstring1 LIKE 'ss%')
                 AND (10.175 > cbigint)))
{noformat}


> Vectorization: vectorization_short_regress.q has query result differences with non-vectorized
run.  Vectorized unary function broken?
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16919
>                 URL: https://issues.apache.org/jira/browse/HIVE-16919
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>
> Jason spotted a difference in the query result for vectorization_short_regress.q.out
-- that is when vectorization is turned off and a base .q.out file created, there are 2 differences.
> They both seem to be related to negation.  For example, in the first one MAX(cint) and
MAX(cint) appear earlier as columns and match non-vec and vec.  So, it doesn't appear that
aggregation is failing.  It seems like the issue is now that the Reducer is vectorizing, a
bug is exposed.  So, even though MAX and MIN are the same, the expression with negation returns
different results.
> 19th field of the query below: Vectorized 511 vs Non-Vectorized -58
> {noformat}
> SELECT MAX(cint),
>        (MAX(cint) / -3728),
>        (MAX(cint) * -3728),
>        VAR_POP(cbigint),
>        (-((MAX(cint) * -3728))),
>        STDDEV_POP(csmallint),
>        (-563 % (MAX(cint) * -3728)),
>        (VAR_POP(cbigint) / STDDEV_POP(csmallint)),
>        (-(STDDEV_POP(csmallint))),
>        MAX(cdouble),
>        AVG(ctinyint),
>        (STDDEV_POP(csmallint) - 10.175),
>        MIN(cint),
>        ((MAX(cint) * -3728) % (STDDEV_POP(csmallint) - 10.175)),
>        (-(MAX(cdouble))),
>        MIN(cdouble),
>        (MAX(cdouble) % -26.28),
>        STDDEV_SAMP(csmallint),
>        (-((MAX(cint) / -3728))),
>        ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),
>        ((MAX(cint) / -3728) - AVG(ctinyint)),
>        (-((MAX(cint) * -3728))),
>        VAR_SAMP(cint)
> FROM   alltypesorc
> WHERE  (((cbigint <= 197)
>          AND (cint < cbigint))
>         OR ((cdouble >= -26.28)
>             AND (csmallint > cdouble))
>         OR ((ctinyint > cfloat)
>             AND (cstring1 RLIKE '.*ss.*'))
>            OR ((cfloat > 79.553)
>                AND (cstring2 LIKE '10%')))
> {noformat}
> Column expression is:  ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),
> -----------------------------------------------
> This is a previously existing issue and now filed as  HIVE-16919: "Vectorization: vectorization_short_regress.q
has query result differences with non-vectorized run"
> 10th field of the query below: Non-Vectorized -6432.000015344526 vs. -Vectorized -6432.0
> Column expression is (-(cdouble)) as c4,
> Query result for vectorization_short_regress.q.out -- that is when vectorization is turned
off and a base .q.out file created.
> -----------------------------------------------
> 10th field of the query below: Non-Vectorized -6432.000015344526 vs. Vectorized -6432.0
> Column expression is (-(cdouble)) as c4,
> {noformat}
> SELECT   ctimestamp1,
>          cstring2,
>          cdouble,
>          cfloat,
>          cbigint,
>          csmallint,
>          (cbigint / 3569) as c1,
>          (-257 - csmallint) as c2,
>          (-6432 * cfloat) as c3,
>          (-(cdouble)) as c4,
>          (cdouble * 10.175) as c5,
>          ((-6432 * cfloat) / cfloat) as c6,
>          (-(cfloat)) as c7,
>          (cint % csmallint) as c8,
>          (-(cdouble)) as c9,
>          (cdouble * (-(cdouble))) as c10
> FROM     alltypesorc
> WHERE    (((-1.389 >= cint)
>            AND ((csmallint < ctinyint)
>                 AND (-6432 > csmallint)))
>           OR ((cdouble >= cfloat)
>               AND (cstring2 <= 'a'))
>              OR ((cstring1 LIKE 'ss%')
>                  AND (10.175 > cbigint)))
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message