hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-12664) Bug in reduce deduplication optimization causing ArrayOutOfBoundException
Date Tue, 05 Jan 2016 00:33:39 GMT

    [ https://issues.apache.org/jira/browse/HIVE-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082094#comment-15082094
] 

Ashutosh Chauhan commented on HIVE-12664:
-----------------------------------------

[~johang] Seems like you are not on master branch. your posted stacktrace doesnt match with
master. Nor provided patch applies. You need to rebase patch for master. Also, if you can
provide a query which reproduces issue on master that will be great.

> Bug in reduce deduplication optimization causing ArrayOutOfBoundException
> -------------------------------------------------------------------------
>
>                 Key: HIVE-12664
>                 URL: https://issues.apache.org/jira/browse/HIVE-12664
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.1.1, 1.2.1
>            Reporter: Johan Gustavsson
>            Assignee: Johan Gustavsson
>         Attachments: HIVE-12664-1.patch, HIVE-12664.1.patch, HIVE-12664.patch
>
>
> The optimisation check for reduce deduplication only checks the first child node for
join -and the check itself also contains a major bug- causing ArrayOutOfBoundException no
matter what.
> Sample data table form:
> ||time||user||host||path||referer||code||agent||size||method||
> |int|string|string|string|string|bigint|string|bigint|string|
> Sample query
> {code:sql}
> SELECT 
>   t1.host,
>   COUNT(DISTINCT t1.`date`) AS login_count,
>   MAX(t2.code) AS code,
>   unix_timestamp() AS time
> FROM (
>     SELECT 
>       HOST,
>       MIN(time) AS DATE
>     FROM
>       www_access
>     WHERE
>       HOST IS NOT NULL
>     GROUP BY
>       HOST
>   ) t1
> JOIN (
>     SELECT 
>       HOST,
>       MIN(time) AS code
>     FROM
>       www_access
>     WHERE
>       HOST IS NOT NULL
>     GROUP BY
>       HOST
>   ) t2
>   ON t1.host = t2.host
> GROUP BY
>   t1.host
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message