hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching
Date Tue, 20 Sep 2016 18:21:20 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507333#comment-15507333
] 

Siddharth Seth commented on HIVE-7926:
--------------------------------------

bq. In the sentence: “The initial stage of the query is pushed into #LLAP, large shuffle
is performed in their own containers” - What does "their own containers" refer to? Is there
only one large shuffle, or multiple shuffles?
When executing a query, it's possible to launch separate containers (Java processes, fallback
to regular Tez execution) to perform the large Shuffles. In many cases, running a Shuffle
/ Reduce within LLAP may not be beneficial (no caching gains, etc). That said - it's also
possible to run these Shuffle/Reduce steps within LLAP itself, and that is the typical case
for short running queries. Multiple shuffles are possible.
This point primarily talks about where a reduce will run - within the LLAP daemon itself,
or as a separate container (process).

bq. In the sentence: "The node allows parallel execution for multiple query fragments from
different queries and sessions” - what does "the node" refer to? A single LLAP node?
Yes - that refers to an LLAP instance. A single LLAP process can handle multiple fragments
from different queries, or the same query.

> long-lived daemons for query fragment execution, I/O and caching
> ----------------------------------------------------------------
>
>                 Key: HIVE-7926
>                 URL: https://issues.apache.org/jira/browse/HIVE-7926
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>              Labels: TODOC2.0
>             Fix For: 2.0.0
>
>         Attachments: LLAPdesigndocument.pdf
>
>
> We are proposing a new execution model for Hive that is a combination of existing process-based
tasks and long-lived daemons running on worker nodes. These nodes can take care of efficient
I/O, caching and query fragment execution, while heavy lifting like most joins, ordering,
etc. can be handled by tasks.
> The proposed model is not a 2-system solution for small and large queries; neither it
is a separate execution engine like MR or Tez. It can be used by any Hive execution engine,
if support is added; in future even external products (e.g. Pig) can use it.
> The document with high-level design we are proposing will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message