hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-549) Parallel Execution Mechanism
Date Tue, 03 Nov 2009 23:54:32 GMT

    [ https://issues.apache.org/jira/browse/HIVE-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773294#action_12773294
] 

Ning Zhang commented on HIVE-549:
---------------------------------

Just curious, how do we handle exceptions if tasks are executed in parallel. I am asking this
because we probably need to setup a global configuration at any point of the query execution
(e.g., change the replication factor of a table in the map-side join), and change it back
when the whole query is finished or an exception is caught at the end. 

It seems to me that the parallel execution paradigm can be applied only if there is a child
task waiting on all the parallel tasks at any point. The child is responsible for rolling
back if any exception is caught. An example I am thinking that cannot use the parallel paradigm
is the multi-insert case, where each insert is a branch and they don't "meet" at the bottom
of the plan. 

> Parallel Execution Mechanism
> ----------------------------
>
>                 Key: HIVE-549
>                 URL: https://issues.apache.org/jira/browse/HIVE-549
>             Project: Hadoop Hive
>          Issue Type: Wish
>          Components: Query Processor
>            Reporter: Adam Kramer
>            Assignee: Chaitanya Mishra
>         Attachments: HIVE-549-v3.patch
>
>
> In a massively parallel database system, it would be awesome to also parallelize some
of the mapreduce phases that our data needs to go through.
> One example that just occurred to me is UNION ALL: when you union two SELECT statements,
effectively you could run those statements in parallel. There's no situation (that I can think
of, but I don't have a formal proof) in which the left statement would rely on the right statement,
or vice versa. So, they could be run at the same time...and perhaps they should be. Or, perhaps
there should be a way to make this happen...PARALLEL UNION ALL? PUNION ALL?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message