hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-1047) Merge tasks in GenMRUnion1
Date Wed, 13 Jan 2010 04:09:54 GMT

     [ https://issues.apache.org/jira/browse/HIVE-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ning Zhang updated HIVE-1047:
-----------------------------

    Attachment: HIVE-1047.patch

Uploading HIVE-1047.patch, which merges the currTask in GenMRUnion1 rather than GenMRFileSink1.

> Merge tasks in GenMRUnion1
> --------------------------
>
>                 Key: HIVE-1047
>                 URL: https://issues.apache.org/jira/browse/HIVE-1047
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1047.patch
>
>
> In the following query:
> from (select * from src  union all select * from src) s
> insert overwrite table src_multi1 select * where key < 10
> insert overwrite table src_multi2 select * where key > 10 and key < 20;
> There are two topOps (TableScaneOperator) for the same MapRed task. In genTableScan1,
each TableScanOperator will create a new task as currTask. The genMRUnion1 should merge two
tasks into one. Currently GenMRUnion1 does not merge currTask, this will cause down stream
operators like genFileSink1 to  do some hacks to effectively merge the two tasks. A cleaner
way is to merge the tasks in GenMRUnion1 as done by join operators etc. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message