hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-549) Parallel Execution Mechanism
Date Tue, 03 Nov 2009 20:10:32 GMT

    [ https://issues.apache.org/jira/browse/HIVE-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773160#action_12773160
] 

Zheng Shao commented on HIVE-549:
---------------------------------

We have a task called ConditionalTask which may or may not submit a mapreduce task at runtime.
It's true that map-reduce jobs usually takes longer time, but given ConditionalTask it seems
a better idea to treat all tasks the same when reporting progress.

I agree "1/7, 4/7 and 2/7" on the job tracker may seems bad, but user should see on the command
line all 7 tasks starting and finishing.
We can print out task starting/finishing information on the command line for all tasks.
For example, every time a task starts/finishes, we can print out "Stage-3 started.  Total:
7, Pending: 3, Running: 2, Finished: 2", or "Stage-3 finished.  Total: 7, Pending: 3, Running:
2, Finished: 2"

What do you think?

By the way, I am OK with finishing this issue first, and then do better progress information
in a separate transaction if you want.


> Parallel Execution Mechanism
> ----------------------------
>
>                 Key: HIVE-549
>                 URL: https://issues.apache.org/jira/browse/HIVE-549
>             Project: Hadoop Hive
>          Issue Type: Wish
>          Components: Query Processor
>            Reporter: Adam Kramer
>            Assignee: Chaitanya Mishra
>         Attachments: HIVE-549-v3.patch
>
>
> In a massively parallel database system, it would be awesome to also parallelize some
of the mapreduce phases that our data needs to go through.
> One example that just occurred to me is UNION ALL: when you union two SELECT statements,
effectively you could run those statements in parallel. There's no situation (that I can think
of, but I don't have a formal proof) in which the left statement would rely on the right statement,
or vice versa. So, they could be run at the same time...and perhaps they should be. Or, perhaps
there should be a way to make this happen...PARALLEL UNION ALL? PUNION ALL?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message