hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang Haihua (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17124) PlanUtils: Rand() is not a failure-tolerant distribution column
Date Sun, 27 May 2018 05:41:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491927#comment-16491927
] 

Wang Haihua commented on HIVE-17124:
------------------------------------

Any review update? And which case does this patch fixed? We suffer from running repeatedly
query with dynamic partition with distributed by rand , result in data count inconsistency 
Thanks [~gopalv]

> PlanUtils: Rand() is not a failure-tolerant distribution column
> ---------------------------------------------------------------
>
>                 Key: HIVE-17124
>                 URL: https://issues.apache.org/jira/browse/HIVE-17124
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 2.3.0, 3.0.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Major
>         Attachments: HIVE-17124.1.patch
>
>
> {code}
> else {
>       // numPartitionFields = -1 means random partitioning
>       partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand"));
>     }
> {code}
> This causes known data corruption during failure tolerance operations.
> There is a failure tolerant distribution function inside ReduceSinkOperator, which kicks
in automatically when using no partition columns
> {code}
>     if (partitionEval.length == 0) {
>       // If no partition cols, just distribute the data uniformly
>       // to provide better load balance. If the requirement is to have a single reducer,
we should
>       // set the number of reducers to 1. Use a constant seed to make the code deterministic.
>       if (random == null) {
>         random = new Random(12345);
>       }
>       keyHashCode = random.nextInt();
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message