kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhong Yanghong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KYLIN-3388) Data may become not correct if mappers fail during the cube building step, "distribute by rand()"
Date Mon, 28 May 2018 05:05:00 GMT

    [ https://issues.apache.org/jira/browse/KYLIN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492269#comment-16492269
] 

Zhong Yanghong commented on KYLIN-3388:
---------------------------------------

!Hive Issue - distribute by rand().png!
As the above figure shown, after the map step, data for reducers has been prepared. Suppose
R1 starts to run first. It will pull data D1,1 & D2,1 from mappers. Then it finishes.
Then R2 begins to run. Unluckily, this time M2 is unavailable. Then R2 will ask to start another
mapper called M'2. After M'2 prepared data D'2,1 & D'2,2, R2 pulls data D1,2 from M1,
and pulls data D'2,2 from M'2. Finally R2 finishes its job.

Then the input for reducers will become D1,1 & D2,1, D1,2 & D2',2, rather than D1,1
& D2,1, D1,2 & D2,2. Since the partitioner for this hive job is not fixed, the data
D2,2 & D'2,2 are rarely the same. Therefore, the final result will become incorrect.

> Data may become not correct if mappers fail during the cube building step, "distribute
by rand()"
> -------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-3388
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3388
>             Project: Kylin
>          Issue Type: Bug
>            Reporter: Zhong Yanghong
>            Priority: Critical
>         Attachments: Hive Issue - distribute by rand().png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message