hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-20868) SMB Join fails intermittently when TezDummyOperator has child op in getFinalOp in MapRecordProcessor
Date Wed, 07 Nov 2018 18:54:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-20868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gopal V updated HIVE-20868:
---------------------------
    Description: 
In MapRecordProcessor::getFinalOp() due to external cause(not known), the TezDummyStoreOperator
may have MergeJoin Op as child intermittently. Due to this, the fetchDone remains set to true
for the DummyOp which was set by previous task. Ideally, fetchDone should be reset for each
task. This eventually leads to the join op skip rows from that dummy op resulting in wrong
results.

Good init order

{code}
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops =
TS[3] (core)
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops =
FIL[24]
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops =
SEL[5]
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops =
DUMMY_STORE[45]
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: Iterating children of
dummy op DUMMY_STORE[45]
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp returns DUMMY_STORE[45]
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: InitProcessor : setting
fetchDone to false
{code}

Bad init order 

{code}
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  getFinalOp child Ops
= TS[3] (core)
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  getFinalOp child Ops
= FIL[24]
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  getFinalOp child Ops
= SEL[5]
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  getFinalOp child Ops
= DUMMY_STORE[45]
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  Iterating children of
dummy op DUMMY_STORE[45]
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  Child of Dummy Op MERGEJOIN[44]
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  getFinalOp child Ops
= MERGEJOIN[44]
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  getFinalOp child Ops
= SEL[13]
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  getFinalOp child Ops
= RS[14]
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  getFinalOp returns RS[14]
{code}

  was:In MapRecordProcessor::getFinalOp() due to external cause(not known), the TezDummyStoreOperator
may have MergeJoin Op as child intermittently. Due to this, the fetchDone remains set to true
for the DummyOp which was set by previous task. Ideally, fetchDone should be reset for each
task. This eventually leads to the join op skip rows from that dummy op resulting in wrong
results.


> SMB Join fails intermittently when TezDummyOperator has child op in getFinalOp in MapRecordProcessor
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-20868
>                 URL: https://issues.apache.org/jira/browse/HIVE-20868
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Deepak Jaiswal
>            Assignee: Deepak Jaiswal
>            Priority: Major
>         Attachments: HIVE-20868.1.patch
>
>
> In MapRecordProcessor::getFinalOp() due to external cause(not known), the TezDummyStoreOperator
may have MergeJoin Op as child intermittently. Due to this, the fetchDone remains set to true
for the DummyOp which was set by previous task. Ideally, fetchDone should be reset for each
task. This eventually leads to the join op skip rows from that dummy op resulting in wrong
results.
> Good init order
> {code}
> 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child
Ops = TS[3] (core)
> 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child
Ops = FIL[24]
> 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child
Ops = SEL[5]
> 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child
Ops = DUMMY_STORE[45]
> 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: Iterating children
of dummy op DUMMY_STORE[45]
> 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp returns
DUMMY_STORE[45]
> 2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: InitProcessor : setting
fetchDone to false
> {code}
> Bad init order 
> {code}
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  getFinalOp child
Ops = TS[3] (core)
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  getFinalOp child
Ops = FIL[24]
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  getFinalOp child
Ops = SEL[5]
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  getFinalOp child
Ops = DUMMY_STORE[45]
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  Iterating children
of dummy op DUMMY_STORE[45]
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  Child of Dummy Op
MERGEJOIN[44]
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  getFinalOp child
Ops = MERGEJOIN[44]
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  getFinalOp child
Ops = SEL[13]
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  getFinalOp child
Ops = RS[14]
> 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|:  getFinalOp returns
RS[14]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message