hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chaitanya Mishra (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-549) Parallel Execution Mechanism
Date Mon, 23 Nov 2009 21:51:39 GMT

    [ https://issues.apache.org/jira/browse/HIVE-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781643#action_12781643
] 

Chaitanya Mishra commented on HIVE-549:
---------------------------------------

This patch differs from the prev patch in the following ways:

(a) mapredWork is not stored as a threadlocal variable. Instead we maintain a map from jobname
-> mapredwork. This was essential since Hive can launch tasks in localjobrunner mode. This
is pretty much like Zheng's original suggestion.

(b) There is additional code to ensure that a job always has a randomly generated name, to
ensure that the code doesn't break.

(c) There is also code to ensure that the distributed cache has a unique handle for plan information.
Originally it was always stored as HIVE_PLAN

(d) Sessionstate was a threadlocal variable. Therefore new code to initlialize sessionstate
for new threads has been put in.

(e) Only map-reduce tasks are launched using new threads. Non map-reduce tasks are launched
within the same driver thread. This is to ensure that simple tasks like describe function
don't pay the cost of threadlaunching + sleeping and polling for threads.

(f) At most maxthreads=8 threads are launched.

> Parallel Execution Mechanism
> ----------------------------
>
>                 Key: HIVE-549
>                 URL: https://issues.apache.org/jira/browse/HIVE-549
>             Project: Hadoop Hive
>          Issue Type: Wish
>          Components: Query Processor
>            Reporter: Adam Kramer
>            Assignee: Chaitanya Mishra
>         Attachments: HIVE-549-v4.patch, HIVE-549-v5.patch
>
>
> In a massively parallel database system, it would be awesome to also parallelize some
of the mapreduce phases that our data needs to go through.
> One example that just occurred to me is UNION ALL: when you union two SELECT statements,
effectively you could run those statements in parallel. There's no situation (that I can think
of, but I don't have a formal proof) in which the left statement would rely on the right statement,
or vice versa. So, they could be run at the same time...and perhaps they should be. Or, perhaps
there should be a way to make this happen...PARALLEL UNION ALL? PUNION ALL?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message