hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Vary (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17532) Hive on Spark query compilation starts Spark session
Date Thu, 14 Sep 2017 13:06:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166272#comment-16166272
] 

Peter Vary commented on HIVE-17532:
-----------------------------------

[~pcsaszar]: We recently discussed the same topic in HIVE-17291
There the decision was, that on a production cluster one should use dynamic allocation, and
for the other's it is better to use the actual cores, than the configured one.

Thanks,
Peter

> Hive on Spark query compilation starts Spark session
> ----------------------------------------------------
>
>                 Key: HIVE-17532
>                 URL: https://issues.apache.org/jira/browse/HIVE-17532
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 2.2.0
>            Reporter: Peter Csaszar
>            Priority: Minor
>
> Hive on Spark query compilation starts a new Spark session when some kind of aggregation
is present:
> 0: jdbc:hive2://localhost:10000/default> set hive.execution.engine=spark;
> No rows affected (0.013 seconds)
> 0: jdbc:hive2://localhost:10000/default> explain select distinct label0 from iris;
> INFO  : Compiling command(queryId=hive_20170912151212_914ee322-28dd-442a-9dd9-7ed00a6a8caf):
explain select distinct label0 from iris
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, type:string,
comment:null)], properties:null)
> INFO  : Completed compiling command(queryId=hive_20170912151212_914ee322-28dd-442a-9dd9-7ed00a6a8caf);
Time taken: *40.594* seconds
> Spark job started, all consecutive explain statements are fast:
> 0: jdbc:hive2://localhost:10000/default> explain select distinct a1 from iris;
> INFO  : Compiling command(queryId=hive_20170912151414_faacda24-290e-48bb-9daf-3f301fc170c1):
explain select distinct label0 from iris
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, type:string,
comment:null)], properties:null)
> INFO  : Completed compiling command(queryId=hive_20170912151414_faacda24-290e-48bb-9daf-3f301fc170c1);
Time taken: *0.275* seconds
> Killing the Spark job, the same query is still fast, and no new Spark job has been started:
> 0: jdbc:hive2://localhost:10000/default> explain select distinct a2 from iris;
> INFO  : Compiling command(queryId=hive_20170912151616_a7ea83b6-03ce-4636-b3d4-be6feadcde35):
explain select distinct label0 from iris
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, type:string,
comment:null)], properties:null)
> INFO  : Completed compiling command(queryId=hive_20170912151616_a7ea83b6-03ce-4636-b3d4-be6feadcde35);
Time taken: *0.213* seconds
> The code in question:
> SetSparkReducerParallelism.java:
> sparkSessionManager = SparkSessionManagerImpl.getInstance();
> sparkSession = SparkUtilities.getSparkSession(context.getConf(), sparkSessionManager);
> sparkMemoryAndCores = sparkSession.getMemoryAndCores();
> The created Spark session is used for getting the number of cores and memory only. This
could be determined from the configurations, without actually starting a session.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message