hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
Date Tue, 09 Sep 2014 09:16:29 GMT

     [ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rui Li updated HIVE-8017:
-------------------------
    Attachment: HIVE-8017.2-spark.patch

This patch fixes some failed qfile tests caused by last patch.
Two qtests are not fixed: {{optimize_nullscan.q}} and {{union_remove_25.q}}.
For {{optimize_nullscan.q}}  I checked the corresponding MR output and found the operator
tree in the new output file is more similar to the one in the MR version output. Besides this
failure is of age 2, so I guess it's not related to the patch here.
For {{union_remove_25.q}}, the only diff is the total size of {{outputTbl2}} (6812 -> 6826).
I checked the MR version and the total size is also 6812. I'm not sure what causes this difference.
Maybe need to do more tests for partitioned table.
[~xuefuz] do you have any idea on this?

> Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-8017
>                 URL: https://issues.apache.org/jira/browse/HIVE-8017
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch
>
>
> HiveKey should be used as the key type because it holds the hash code for partitioning.
While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}}
for more complicated ones, e.g. join, bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message