hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-790) race condition related to ScriptOperator + UnionOperator
Date Thu, 27 Aug 2009 05:06:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748262#action_12748262
] 

Ning Zhang commented on HIVE-790:
---------------------------------

@zheng, I'll fix the comment and the test query.

As for the new state, maybe "FINISH" is not a good name for it but I think we need two states
since they have two different situations when an operator has two or more parents: 
 1) the close() is called on this operator, but it doesn't guarantee all its child operators
are also called close() (the FINISH state)
 2) the close() is called and all its children are called close() (the CLOSE state).

The current code set the state CLOSE at the end of the function, which means all its children
(eventually desendants) are closed. So it is the second semantics. What you proposed is the
first semantics, to implement which we need to move the statement to set the state to CLOSE
to the beginning of the close() function (just after the check of the CLOSE state and return
if true). 

We need both both states since if we just have 1 state (CLOSE) and assign it in the beginning,
if there are two parents to the operator, when the first parent call close(), this operator
will set it state to CLOSE and just return without calling close() to all its children (since
the other parent has not been closed). When the second parent call close(), it just return
since its state is already closed. So this end up all children are not closed. We should not
remove the CLOSE state checkup in the beginning since that may cause an operator being closed
multiple times.

We cannot use just the CLOSE state as it is in the current implementation as well since the
CLOSE state is set at the end of the close() function. When a parent calls this operator's
close(), the parent's state is still not in CLOSE. So we end up just return and don't close
the child operators. If we have the FINISH state and this state is set at the beginning of
close(), whenever a parent calls close(), the parent is in the FINISH state and this operator
can check and treat FINISH the same as CLOSE except that this operator hasn't return yet.



> race condition related to ScriptOperator + UnionOperator
> --------------------------------------------------------
>
>                 Key: HIVE-790
>                 URL: https://issues.apache.org/jira/browse/HIVE-790
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>            Assignee: Ning Zhang
>         Attachments: Hive-790.patch
>
>
> ScriptOperator uses a second thread to output the rows to the children operators. In
a corner case which contains a union, 2 threads might be outputting data into the same operator
hierarchy and caused race conditions.
> {code}
> CREATE TABLE tablea (cola STRING);
> SELECT *
> FROM (
>     SELECT TRANSFORM(cola)
>     USING 'cat'
>     AS cola
>     FROM tablea
>   UNION ALL
>     SELECT cola as cola
>     FROM tablea
> ) a;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message