hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Remus Rusanu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6873) DISTINCT clause in aggregates is handled incorrectly by vectorized execution
Date Thu, 10 Apr 2014 07:46:17 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965099#comment-13965099
] 

Remus Rusanu commented on HIVE-6873:
------------------------------------

[~jnp] From how I read GroupByOptimizer.java@227, I reckon there are some cases when the reduce
side does expect the mapper to had been doing the correct aggregation:
{code}
          // Partial aggregation is not done for distincts on the mapper
          // However, if the data is bucketed/sorted on the distinct key, partial aggregation
          // can be performed on the mapper.
{code}

> DISTINCT clause in aggregates is handled incorrectly by vectorized execution
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-6873
>                 URL: https://issues.apache.org/jira/browse/HIVE-6873
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.13.0, 0.14.0
>            Reporter: Remus Rusanu
>            Assignee: Remus Rusanu
>         Attachments: HIVE-6873.1.patch, HIVE-6873.2.patch
>
>
> The vectorized aggregates ignore the DISTINCT clause. This cause incorrect results. Due
to how GroupByOperatorDesc adds the DISTINCT keys to the overall aggregate keys the vectorized
aggregates do account for the extra key, but they do not process the data correctly for the
key. the reduce side the aggregates the input from the vectorized map side to results that
are only sometimes correct but mostly incorrect. HIVE-4607 tracks the proper fix, but meantime
I'm filing a bug to disable vectorized execution if DISTINCT is present. Fix is trivial.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message