hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Yongqiang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-549) Parallel Execution Mechanism
Date Fri, 23 Oct 2009 20:49:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769433#action_12769433
] 

He Yongqiang commented on HIVE-549:
-----------------------------------

Agree on adding a new TaskRunner which extends Thread.
I think we can define several states for each TaskRunner. 
An simple example can be:
1. iterate the task queue and for each task lunch a new TaskRunner for it
2. For a TaskRunner serving for a special task, first check the states of all of his parents
   2.1 If any of his parents got dead(Failed), Fail itself
   2.2 if all parents are successful, lunch itself and mark itself as running
   2.3 if any of its parents are not running or are not finished/failed, mark itself as pending.
And also add itself in a waiting queue of its not finished parent.
3. If a task successfully finished or failed, notify all children in its waiting queue of
whethe it successed or failed. 
Obviously, this simple algorithm has one problem: each task got one dedicated thread allocated
for it. It will be better if we can compress thread numbers.

> Parallel Execution Mechanism
> ----------------------------
>
>                 Key: HIVE-549
>                 URL: https://issues.apache.org/jira/browse/HIVE-549
>             Project: Hadoop Hive
>          Issue Type: Wish
>          Components: Query Processor
>            Reporter: Adam Kramer
>            Assignee: Chaitanya Mishra
>
> In a massively parallel database system, it would be awesome to also parallelize some
of the mapreduce phases that our data needs to go through.
> One example that just occurred to me is UNION ALL: when you union two SELECT statements,
effectively you could run those statements in parallel. There's no situation (that I can think
of, but I don't have a formal proof) in which the left statement would rely on the right statement,
or vice versa. So, they could be run at the same time...and perhaps they should be. Or, perhaps
there should be a way to make this happen...PARALLEL UNION ALL? PUNION ALL?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message