hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brock Noland (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-7768) Research growing/shrinking our Spark Application
Date Mon, 18 Aug 2014 20:31:18 GMT

     [ https://issues.apache.org/jira/browse/HIVE-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Brock Noland updated HIVE-7768:
-------------------------------

    Description: 
Scenario:

A user connects to Hive and runs a query on a small time. Our SC is sized for that small table.
They then run a query on a much larger table. We'll need to "re-size" the SC which I don't
think Spark supports today, so we need to research what is available today in Spark and how
Tez works.

More details:
Similar to Tez, it's likely our "SparkContext" is going to be long lived and process many
queries. Some queries will be large and some small. Additionally the SC might be idle for
long periods of time.

In this JIRA we will research the following:

* How Spark decides the number of slaves for a given RDD today
* Given a SC when you create a new RDD based on a much larger input dataset, does the SC adjust?
* How Tez increases/decreases the size of the running YARN application (set of slaves)
* How Tez handles scenarios when it has a running set of slaves in YARN and requests more
resources for a query and fails to get additional resources
* How Tez decides to timeout idle slaves

This will guide requirements we'll need from YARN.

  was:
Similar to Tez, it's likely our "SparkContext" is going to be long lived and process many
queries. Some queries will be large and some small. Additionally the SC might be idle for
long periods of time.

In this JIRA we will research the following:

* How Spark decides the number of slaves for a given RDD today
* Given a SC when you create a new RDD based on a much larger input dataset, does the SC adjust?
* How Tez increases/decreases the size of the running YARN application (set of slaves)
* How Tez handles scenarios when it has a running set of slaves in YARN and requests more
resources for a query and fails to get additional resources
* How Tez decides to timeout idle slaves

This will guide requirements we'll need from YARN.


> Research growing/shrinking our Spark Application
> ------------------------------------------------
>
>                 Key: HIVE-7768
>                 URL: https://issues.apache.org/jira/browse/HIVE-7768
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Brock Noland
>
> Scenario:
> A user connects to Hive and runs a query on a small time. Our SC is sized for that small
table. They then run a query on a much larger table. We'll need to "re-size" the SC which
I don't think Spark supports today, so we need to research what is available today in Spark
and how Tez works.
> More details:
> Similar to Tez, it's likely our "SparkContext" is going to be long lived and process
many queries. Some queries will be large and some small. Additionally the SC might be idle
for long periods of time.
> In this JIRA we will research the following:
> * How Spark decides the number of slaves for a given RDD today
> * Given a SC when you create a new RDD based on a much larger input dataset, does the
SC adjust?
> * How Tez increases/decreases the size of the running YARN application (set of slaves)
> * How Tez handles scenarios when it has a running set of slaves in YARN and requests
more resources for a query and fails to get additional resources
> * How Tez decides to timeout idle slaves
> This will guide requirements we'll need from YARN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message