hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3915) Union with map-only query on one side and two MR job query on the other produces wrong results
Date Fri, 18 Jan 2013 16:10:15 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557310#comment-13557310
] 

Hudson commented on HIVE-3915:
------------------------------

Integrated in hive-trunk-hadoop1 #24 (See [https://builds.apache.org/job/hive-trunk-hadoop1/24/])
    HIVE-3915 Union with map-only query on one side and two MR job query on the other
produces wrong results (Kevin Wilfong via namit) (Revision 1435203)

     Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435203
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
* /hive/trunk/ql/src/test/queries/clientpositive/union33.q
* /hive/trunk/ql/src/test/results/clientpositive/union33.q.out

                
> Union with map-only query on one side and two MR job query on the other produces wrong
results
> ----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3915
>                 URL: https://issues.apache.org/jira/browse/HIVE-3915
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.11.0
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>             Fix For: 0.11.0
>
>         Attachments: HIVE-3915.1.patch.txt
>
>
> When a query contains a union with a map only subquery on one side and a subquery involving
two sequential map reduce jobs on the other, it can produce wrong results.  It appears that
if the map only queries table scan operator is processed first the task involving a union
is made a root task.  Then when the other subquery is processed, the second map reduce job
gains the task involving the union as a child and it is made a root task.  This means that
both the first and second map reduce jobs are root tasks, so the dependency between the two
is ignored.  If they are run in parallel (i.e. the cluster has more than one node) no results
will be produced for the side of the union with the two map reduce jobs and only the results
of the other side of the union will be returned.
> The order TableScan operators are processed is crucial to reproducing this bug, and it
is determined by the order values are retrieved from a map, and hence hard to predict, so
it doesn't always reproduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message