hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chengxiang Li (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-7776) enable sample10.q.[Spark Branch]
Date Wed, 10 Sep 2014 08:45:29 GMT

     [ https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chengxiang Li updated HIVE-7776:
--------------------------------
    Attachment: HIVE-7776.1-spark.patch

Hive get task Id through 2 ways in Utilities::getTaskId:
# get parameter value of mapred.task.id from configuration.
# generate random value while #1 return null.
Currently, Hive on Spark can't get parameter value of mapred.task.id from configuration.

FileSinkOperator use taskid to distinct different bucket file name, FileSinkOperator should
take taskid as field variable and initiate it only once since one FileSinkOperator instance
only refered in one task. but FileSinkOperator call Utilities::getTaskId to get new taskId
each time, for this issue, it would cause more bucket files than bucket number, which lead
to unexpected result of tablesample queries.

> enable sample10.q.[Spark Branch]
> --------------------------------
>
>                 Key: HIVE-7776
>                 URL: https://issues.apache.org/jira/browse/HIVE-7776
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>         Attachments: HIVE-7776.1-spark.patch
>
>
> sample10.q contain dynamic partition operation, should enable this qtest after hive on
spark support dynamic partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message