Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9D57F1777B for ; Thu, 6 Nov 2014 00:10:34 +0000 (UTC) Received: (qmail 75517 invoked by uid 500); 6 Nov 2014 00:10:34 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 75450 invoked by uid 500); 6 Nov 2014 00:10:34 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 75436 invoked by uid 500); 6 Nov 2014 00:10:34 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 75433 invoked by uid 99); 6 Nov 2014 00:10:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Nov 2014 00:10:34 +0000 Date: Thu, 6 Nov 2014 00:10:34 +0000 (UTC) From: "Xuefu Zhang (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-7768) Research growing/shrinking our Spark Application [Spark Branch] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199423#comment-14199423 ] Xuefu Zhang commented on HIVE-7768: ----------------------------------- Given that SPARK-3174 is resolved, this research should continue. Eventually, we need to figure out how to integrate with it. [~venki387], would you have time on this? Thanks. > Research growing/shrinking our Spark Application [Spark Branch] > --------------------------------------------------------------- > > Key: HIVE-7768 > URL: https://issues.apache.org/jira/browse/HIVE-7768 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Brock Noland > Assignee: Venki Korukanti > Priority: Critical > > Scenario: > A user connects to Hive and runs a query on a small time. Our SC is sized for that small table. They then run a query on a much larger table. We'll need to "re-size" the SC which I don't think Spark supports today, so we need to research what is available today in Spark and how Tez works. > More details: > Similar to Tez, it's likely our "SparkContext" is going to be long lived and process many queries. Some queries will be large and some small. Additionally the SC might be idle for long periods of time. > In this JIRA we will research the following: > * How Spark decides the number of slaves for a given RDD today > * Given a SC when you create a new RDD based on a much larger input dataset, does the SC adjust? > * How Tez increases/decreases the size of the running YARN application (set of slaves) > * How Tez handles scenarios when it has a running set of slaves in YARN and requests more resources for a query and fails to get additional resources > * How Tez decides to timeout idle slaves > This will guide requirements we'll need from Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)