hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt McCline (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16919) Vectorization: vectorization_short_regress.q has query result differences with non-vectorized run. Vectorized unary function broken?
Date Wed, 21 Jun 2017 04:42:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056981#comment-16056981
] 

Matt McCline commented on HIVE-16919:
-------------------------------------

First one:

4 back: ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),

11 back in MIN(cint) -1069736047

1st is MAX(cint) -20301111


> Vectorization: vectorization_short_regress.q has query result differences with non-vectorized
run.  Vectorized unary function broken?
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16919
>                 URL: https://issues.apache.org/jira/browse/HIVE-16919
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>
> Jason spotted a difference in the query result for vectorization_short_regress.q.out
-- that is when vectorization is turned off and a base .q.out file created, there are 2 differences.
> They both seem to be related to negation.  For example, in the first one MAX(cint) and
MAX(cint) appear earlier as columns and match non-vec and vec.  So, it doesn't appear that
aggregation is failing.  It seems like the issue is now that the Reducer is vectorizing, a
bug is exposed.  So, even though MAX and MIN are the same, the expression with negation returns
different results.
> 19th field of the query below: Vectorized 511 vs Non-Vectorized -58
> {noformat}
> SELECT MAX(cint),
>        (MAX(cint) / -3728),
>        (MAX(cint) * -3728),
>        VAR_POP(cbigint),
>        (-((MAX(cint) * -3728))),
>        STDDEV_POP(csmallint),
>        (-563 % (MAX(cint) * -3728)),
>        (VAR_POP(cbigint) / STDDEV_POP(csmallint)),
>        (-(STDDEV_POP(csmallint))),
>        MAX(cdouble),
>        AVG(ctinyint),
>        (STDDEV_POP(csmallint) - 10.175),
>        MIN(cint),
>        ((MAX(cint) * -3728) % (STDDEV_POP(csmallint) - 10.175)),
>        (-(MAX(cdouble))),
>        MIN(cdouble),
>        (MAX(cdouble) % -26.28),
>        STDDEV_SAMP(csmallint),
>        (-((MAX(cint) / -3728))),
>        ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),
>        ((MAX(cint) / -3728) - AVG(ctinyint)),
>        (-((MAX(cint) * -3728))),
>        VAR_SAMP(cint)
> FROM   alltypesorc
> WHERE  (((cbigint <= 197)
>          AND (cint < cbigint))
>         OR ((cdouble >= -26.28)
>             AND (csmallint > cdouble))
>         OR ((ctinyint > cfloat)
>             AND (cstring1 RLIKE '.*ss.*'))
>            OR ((cfloat > 79.553)
>                AND (cstring2 LIKE '10%')))
> {noformat}
> Column expression is:  ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),
> -----------------------------------------------
> This is a previously existing issue and now filed as  HIVE-16919: "Vectorization: vectorization_short_regress.q
has query result differences with non-vectorized run"
> 10th field of the query below: Non-Vectorized -6432.000015344526 vs. -Vectorized -6432.0
> Column expression is (-(cdouble)) as c4,
> Query result for vectorization_short_regress.q.out -- that is when vectorization is turned
off and a base .q.out file created.
> -----------------------------------------------
> 10th field of the query below: Non-Vectorized -6432.000015344526 vs. Vectorized -6432.0
> Column expression is (-(cdouble)) as c4,
> {noformat}
> SELECT   ctimestamp1,
>          cstring2,
>          cdouble,
>          cfloat,
>          cbigint,
>          csmallint,
>          (cbigint / 3569) as c1,
>          (-257 - csmallint) as c2,
>          (-6432 * cfloat) as c3,
>          (-(cdouble)) as c4,
>          (cdouble * 10.175) as c5,
>          ((-6432 * cfloat) / cfloat) as c6,
>          (-(cfloat)) as c7,
>          (cint % csmallint) as c8,
>          (-(cdouble)) as c9,
>          (cdouble * (-(cdouble))) as c10
> FROM     alltypesorc
> WHERE    (((-1.389 >= cint)
>            AND ((csmallint < ctinyint)
>                 AND (-6432 > csmallint)))
>           OR ((cdouble >= cfloat)
>               AND (cstring2 <= 'a'))
>              OR ((cstring1 LIKE 'ss%')
>                  AND (10.175 > cbigint)))
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message