hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gunther Hagleitner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9976) Possible race condition in DynamicPartitionPruner for <200ms tasks
Date Wed, 25 Mar 2015 18:30:53 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380438#comment-14380438
] 

Gunther Hagleitner commented on HIVE-9976:
------------------------------------------

it seems you're setting numexpectedevents to 0 first and then turn around and call decrement.
Why not just set to -1? Also - why atomic integers? as far as i can tell all access to these
maps is synchronized.

> Possible race condition in DynamicPartitionPruner for <200ms tasks
> ------------------------------------------------------------------
>
>                 Key: HIVE-9976
>                 URL: https://issues.apache.org/jira/browse/HIVE-9976
>             Project: Hive
>          Issue Type: Bug
>          Components: Tez
>    Affects Versions: 1.0.0
>            Reporter: Gopal V
>            Assignee: Siddharth Seth
>             Fix For: 1.0.1
>
>         Attachments: HIVE-9976.1.patch, llap_vertex_200ms.png
>
>
> Race condition in the DynamicPartitionPruner between DynamicPartitionPruner::processVertex()
and DynamicPartitionpruner::addEvent() for tasks which respond with both the result and success
in a single heartbeat sequence.
> {code}
> 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] tez.DynamicPartitionPruner:
Expecting: 1, received: 0
> 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: Vertex Input:
store_sales initializer failed, vertex=vertex_1424502260528_1113_4_04 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.hadoop.hive.ql.metadata.HiveException:
Incorrect event count in dynamic parition pruning
> {code}
> !llap_vertex_200ms.png!
> All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, which
seems to be consistently happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message