hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chengxiang Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7372) Select query gives unpredictable incorrect result when parallelism is greater than 1 [Spark Branch]
Date Fri, 11 Jul 2014 02:56:07 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058279#comment-14058279
] 

Chengxiang Li commented on HIVE-7372:
-------------------------------------

{quote}
Thanks for working on this, Chengxiang Li. Patch looks good to me. One minor nit, for cloning,
it might be better to reuse some existing utility methods, or put our implementation in a
utility class for later reuse.
{quote}
I took this as a POC workround and do not pay more attention on clone implementation, as we
don't need to copy key/value in further SparkCollector implementation. But you are write,
we need reasonable coding style at anytime.:D
{quote}
Could you please also check if the sample problem exists in HiveReduceFunction, where rows
are clustered? If so, that can be addressed in a separate JIRA.
{quote}
HiveReduceFunction use SparkCollector as well, so it's ok.


> Select query gives unpredictable incorrect result when parallelism is greater than 1
[Spark Branch]
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7372
>                 URL: https://issues.apache.org/jira/browse/HIVE-7372
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Chengxiang Li
>         Attachments: HIVE-7372.patch
>
>
> In SparkClient.java, if the following property is set, unpredictable, incorrect result
may be observed.
> {code}
>     sparkConf.set("spark.default.parallelism", "1");
> {code}
> It's suspected that there are some concurrency issues, as Spark may process multiple
datasets in a single JVM when parallelism is greater than 1 in order to use multiple cores.
> NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message