hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chaitanya Mishra (JIRA)" <>
Subject [jira] Updated: (HIVE-549) Parallel Execution Mechanism
Date Tue, 27 Oct 2009 21:30:59 GMT


Chaitanya Mishra updated HIVE-549:

    Attachment: Hive-549.patch

Attaching a patch for this. I hope its the right format.

Summary of changes:
- Created, which launches new tasks as threads.
- Created, which encapsulates the return value of the thread.
- Modified execute() function of ql/ to launch tasks as soon as they are runnable.
- Also, modified the Utilities.gWork variable to be ThreadLocal, so that the state of multiple
threads is kept independently.

The end result of this patch is that a task (which is a part of a query plan is launched as
soon as it is runnable, instead of waiting in a queue.


> Parallel Execution Mechanism
> ----------------------------
>                 Key: HIVE-549
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Wish
>          Components: Query Processor
>            Reporter: Adam Kramer
>            Assignee: Chaitanya Mishra
>         Attachments: Hive-549.patch
> In a massively parallel database system, it would be awesome to also parallelize some
of the mapreduce phases that our data needs to go through.
> One example that just occurred to me is UNION ALL: when you union two SELECT statements,
effectively you could run those statements in parallel. There's no situation (that I can think
of, but I don't have a formal proof) in which the left statement would rely on the right statement,
or vice versa. So, they could be run at the same time...and perhaps they should be. Or, perhaps
there should be a way to make this happen...PARALLEL UNION ALL? PUNION ALL?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message