kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Commented] (KYLIN-4829) Support to use thread-level SparkSession to execute query
Date Fri, 04 Dec 2020 07:57:00 GMT

    [ https://issues.apache.org/jira/browse/KYLIN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243804#comment-17243804
] 

ASF GitHub Bot commented on KYLIN-4829:
---------------------------------------

zzcclp commented on pull request #1495:
URL: https://github.com/apache/kylin/pull/1495#issuecomment-738628576


   ## The Results of Testing Manually
   
   ### Test Env
   
   - Hadoop 2.7.0 on docker.
   - Commit : [3b3786c5c](https://github.com/apache/kylin/commit/3b3786c5c9602838cd4abd0a6d40574550ec8622)
   - Sparder Env : 
      spark.executor.cores=1
      spark.executor.instances=4
      spark.executor.memory=2G
      spark.executor.memoryOverhead=1G
      spark.sql.shuffle.partitions=4
   
   
   ### Before this patch
   The shuffle partition number of all querys is 4, which equals to the total cores number.
   ![image](https://user-images.githubusercontent.com/9430290/101136306-19016880-3648-11eb-8ae0-2e02d42a41ac.png)
   
   ![image](https://user-images.githubusercontent.com/9430290/101136373-32a2b000-3648-11eb-83dd-83b52e2d9980.png)
   
   ![image](https://user-images.githubusercontent.com/9430290/101136443-4e0dbb00-3648-11eb-8d31-ac721797ee94.png)
   
   ![image](https://user-images.githubusercontent.com/9430290/101136476-5a921380-3648-11eb-90f7-ec20faeca57b.png)
   
   
   ### After this patch
   The shuffle partition number of each query is calculated according to the scanned bytes
of each query:
   ![image](https://user-images.githubusercontent.com/9430290/101136174-e3f51600-3647-11eb-99c9-290831bb30af.png)
   
   ![image](https://user-images.githubusercontent.com/9430290/101136210-f40cf580-3647-11eb-8eec-0b7bdef93c30.png)
   
   ![image](https://user-images.githubusercontent.com/9430290/101136227-fa02d680-3647-11eb-9001-128c0e3bd490.png)
   
   ![image](https://user-images.githubusercontent.com/9430290/101136249-05ee9880-3648-11eb-9267-c1cb44319697.png)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Support to use thread-level SparkSession to execute query 
> ----------------------------------------------------------
>
>                 Key: KYLIN-4829
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4829
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Query Engine, Spark Engine
>            Reporter: Zhichao  Zhang
>            Assignee: Zhichao  Zhang
>            Priority: Minor
>             Fix For: v4.0.0-beta
>
>
> Currently, when executing a query, it is impossible to configure proper parameters for
each query according to the data will be scanned, such as spark.sql.shuffle.partitions, this
will impact the performance of querying.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message