hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-8118) Support work that have multiple child works to work around SPARK [Spark Branch]
Date Sun, 12 Oct 2014 03:06:34 GMT

     [ https://issues.apache.org/jira/browse/HIVE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xuefu Zhang updated HIVE-8118:
------------------------------
    Summary: Support work that have multiple child works to work around SPARK  [Spark Branch]
 (was: SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple
result collectors [Spark Branch])

> Support work that have multiple child works to work around SPARK  [Spark Branch]
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-8118
>                 URL: https://issues.apache.org/jira/browse/HIVE-8118
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Chao
>              Labels: Spark-M1
>         Attachments: HIVE-8118.pdf
>
>
> In the current implementation, both SparkMapRecordHandler and SparkReduceRecorderHandler
takes only one result collector, which limits that the corresponding map or reduce task can
have only one child. It's very comment in multi-insert queries where a map/reduce task has
more than one children. A query like the following has two map tasks as parents:
> {code}
> select name, sum(value) from dec group by name union all select name, value from dec
order by name
> {code}
> It's possible in the future an optimation may be implemented so that a map work is followed
by two reduce works and then connected to a union work.
> Thus, we should take this as a general case. Tez is currently providing a collector for
each child operator in the map-side or reduce side operator tree. We can take Tez as a reference.
> Likely this is a big change and subtasks are possible. 
> With this, we can have a simpler and clean multi-insert implementation. This is also
the problem observed in HIVE-7731.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message