Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Sat, 9 Aug 2014 00:14:12 +0000 (UTC)
From: "Szehon Ho (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12726697.1405029520330.47701.1407543252325@arcas>
In-Reply-To: <JIRA.12726697.1405029520330@arcas>
References: <JIRA.12726697.1405029520330@arcas>
Subject: [jira] [Commented] (HIVE-7382) Create a MiniSparkCluster and set up
 a testing framework
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HIVE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091486#comment-14091486 ] 

Szehon Ho commented on HIVE-7382:
---------------------------------

This would be similar to HIVE-7665 and would be done by setting spark.master=local-cluster, as opposed to local.  

However, although it's used by spark unit tests, it's not publically exposed in spark.  I tried to set this and got the error: {noformat}java.io.IOException: Cannot run program "/home/szehon/repos/apache-hive/hive/itests/spark-qtest/./bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory{noformat}

in Master.scala.  The error that surfaces is "ApplicationRemoved(FAILED)"

I think HIVE-7665 may serve our use-case for now to unblock testing, as this might be a bit more involved.  Talking with folks, it seems even a local spark cluster will catch most of the issues (including serialization issues).

> Create a MiniSparkCluster and set up a testing framework
> --------------------------------------------------------
>
>                 Key: HIVE-7382
>                 URL: https://issues.apache.org/jira/browse/HIVE-7382
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Szehon Ho
>
> To automatically test Hive functionality over Spark execution engine, we need to create a test framework that can execute Hive queries with Spark as the backend. For that, we should create a MiniSparkCluser for this, similar to other execution engines.
> Spark has a way to create a local cluster with a few processes in the local machine, each process is a work node. It's fairly close to a real Spark cluster. Our mini cluster can be based on that.
> For more info, please refer to the design doc on wiki.


--
This message was sent by Atlassian JIRA
(v6.2#6252)