kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shaofeng SHI (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (KYLIN-3123) Improve Spark Cubing
Date Tue, 02 Jan 2018 02:37:00 GMT

     [ https://issues.apache.org/jira/browse/KYLIN-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shaofeng SHI closed KYLIN-3123.
-------------------------------
    Resolution: Incomplete

> Improve Spark Cubing
> --------------------
>
>                 Key: KYLIN-3123
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3123
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Spark Engine
>    Affects Versions: v2.2.0
>         Environment: HDP , Hbase, Spark 2.6, Centos7
>            Reporter: vu thanh dat
>              Labels: beginner
>             Fix For: v2.2.0
>
>         Attachments: dimension.bmp, measures.bmp, rowkeys.bmp, spark_so_slow_2.bmp
>
>
> Hi all,
> Im using Spark to bulid Kylin cube.
> Data is about 13 millions rows for one step. Partition by date, 10 dimension, no measures.
> I set config:
> kylin.storage.hbase.compression-codec=snappy
> kylin.engine.spark.rdd-partition-cut-mb=1000
> kylin.engine.spark.max-partition=5000
> kylin.engine.spark-conf.spark.master=yarn
> kylin.engine.spark-conf.spark.submit.deployMode=cluster
> kylin.engine.spark-conf.spark.dynamicAllocation.enabled=true
> kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=100
> kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=10240
> kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300
> kylin.engine.spark-conf.spark.shuffle.service.enabled=true
> kylin.engine.spark-conf.spark.shuffle.service.port=7337
> kylin.engine.spark-conf.spark.yarn.queue=default
> kylin.engine.spark-conf.spark.executor.memory=4G
> kylin.engine.spark-conf.spark.executor.cores=4
> Step Build Cube with Spark so slow, about 1hour for this step, can you show me to custom
kylin config for speed up this step. I have 30s servers centos, storage 5.87T and 448 cores.
> I'm attach my config.
> Best regards and thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message