hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Thusoo (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-222) Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.
Date Sat, 10 Jan 2009 17:22:01 GMT

     [ https://issues.apache.org/jira/browse/HIVE-222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashish Thusoo updated HIVE-222:
-------------------------------

    Attachment: patch-222.txt

Fix for the bug.

There was a bug in way the the aggregation list was being generated for the map side aggregation.
As a result the ordering of the aggregations in the map side groupby operator and the reduce
side groupby operator would differ leading to this problem. Ideally, we should be using the
row schema information to generate the order but that needs a much larger refactor of  how
we generate plans in the group by case. For now this patch should fix the problem.

There are prexisting tests that test this (groupby2_map.q and groupby3_map.q). The test case
however relies on an internal hashmap giving the keys in a certain order. The bug was easily
reproducible with the patch in HIVE-179. I have tested it with that patch.


> Group by on a combination of disitinct and non distinct aggregates can return serialization
errors with map side aggregations.
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-222
>                 URL: https://issues.apache.org/jira/browse/HIVE-222
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch-222.txt
>
>
> For queries of the form (groupby2_map.q in the source)
> SELECT x, count(DISTINCT y), SUM(y) FROM t GROUP BY x
> when map side aggregation is on 
> hive.map.aggr=true (This is off by default)
> The following exception can occur:
>     [junit] Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to
java.lang.Double
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeTypeDouble.serialize(DynamicSerDeTypeDouble.java:60)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeFieldList.serialize(DynamicSerDeFieldList.java:235)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeStructBase.serialize(DynamicSerDeStructBase.java:81)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.serialize(DynamicSerDe.java:174)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message