hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 明浩 冯 <qiuff...@hotmail.com>
Subject Re: hive on spark job not start enough executors
Date Fri, 09 Sep 2016 09:08:28 GMT
All the parameters except spark.executor.instances are specified in spark-default.conf located
in hive's conf folder.  So I think it's a yes.

I also checked on spark's web page when a hive on spark job is running, the parameters shown
on the web page are exactly what I specified in the config file including spark.shuffle.service.enabled
and spark.dynamicAllocation.enabled.


Should I specify a fixed executor.instances in the file? But it's not good for me.


By the way, the data source of my query is parquet files. In hive side I just created a external
table from the parquet.



Thanks,

Minghao Feng

________________________________
From: Mich Talebzadeh <mich.talebzadeh@gmail.com>
Sent: Friday, September 9, 2016 4:49:55 PM
To: user
Subject: Re: hive on spark job not start enough executors

when you start hive on spark do you set any parameters for the submitted job (or read them
from init file)?

set spark.master=yarn;
set spark.deploy.mode=client;
set spark.executor.memory=3g;
set spark.driver.memory=3g;
set spark.executor.instances=2;
set spark.ui.port=7777;


Dr Mich Talebzadeh



LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction
of data or any other property which may arise from relying on this email's technical content
is explicitly disclaimed. The author will in no case be liable for any monetary damages arising
from such loss, damage or destruction.



On 9 September 2016 at 09:30, ?? ? <qiuffeng@hotmail.com<mailto:qiuffeng@hotmail.com>>
wrote:

Hi there,


I encountered a problem that makes hive on spark with a very low performance.

I'm using spark 1.6.2 and hive 2.1.0, I specified


    spark.shuffle.service.enabled    true
    spark.dynamicAllocation.enabled  true

in my spark-default.conf file (the file is in both spark and hive conf folder) to make spark
job to get executors dynamically.
The configuration works correctly when I run spark jobs, but when I use hive on spark, it
only started a few executors although there are more enough cores and memories to start more
executors.
For example, for the same SQL query, if I run on sparkSQL, it can start more than 20 executors,
but with hive on spark, only 3.

How can I improve the performance on hive on spark? Any suggestions please.

Thanks,
Minghao Feng


Mime
View raw message