kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kumar, Manoj H" <manoj.h.ku...@jpmorgan.com>
Subject RE: optimal parameters
Date Sat, 03 Feb 2018 05:09:51 GMT
Thanks for your inputs.. Is there any other way to get 80+ dimensions into one Cube?

Can we split the cube – 20 Dimension

Cube 1 – 20 dimensions
Cube2 – 20 dimensions

Query should take the data from both cube – Cube1+cube2 – so that Tableau will have 40
dimensions into one worksheet. Pls. advise.

Regards,
Manoj

From: ShaoFeng Shi [mailto:shaofengshi@apache.org]
Sent: Friday, February 02, 2018 4:09 PM
To: user <user@kylin.apache.org>
Subject: Re: optimal parameters

Hi Manoj,


450 millions in one build is a common case for Kylin. But 80+ dimensions is too many, as by
default the cube will have 2^N dimension combinations (N is dimension number). I think you
have optimized the aggregation group, as by default Kylin only allows 2048 combinations in
one Cube.

 If you see the build is very slow, a possible reason is the cluster's capacity. Please try
a smaller data set with a simpler Cube first, and then increase that based on the performance.

2018-02-02 18:17 GMT+08:00 Kumar, Manoj H <manoj.h.kumar@jpmorgan.com<mailto:manoj.h.kumar@jpmorgan.com>>:
Any updates on this?? How to process 450 milions of records in one partition – fact table
has this much data for one COB.

Regards,
Manoj

From: Kumar, Manoj H
Sent: Friday, February 02, 2018 11:45 AM
To: 'user@kylin.apache.org<mailto:user@kylin.apache.org>' <user@kylin.apache.org<mailto:user@kylin.apache.org>>
Subject: optimal parameters
Importance: High

Hi Folks – Need your inputs for optimizing the kylin Cube build process – We have approx..
450 millions of records in one Partition & 80-90 Dimensions to be picked up from the tables.
Can you pls. advise on this? What would be optimal way of running the jobs.We have Cloudera
cluster of 16 nodes – with 8 cores machine for each nodes.

This process is running since 60 minutes.

2018-02-01 23:54:16,257 INFO  [pool-9-thread-1] threadpool.DefaultScheduler:116 : CubingJob{id=2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd,
name=BUILD CUBE - Deposits - 20170929000000_201709      30000000 - GMT+08:00 2018-02-02 12:37:11,
state=READY} scheduled
79923 2018-02-01 23:54:16,258 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111
: Executing AbstractExecutable (BUILD CUBE - Deposits - 20170929000000_20      170930000000
- GMT+08:00 2018-02-02 12:37:11)
79924 2018-02-01 23:54:16,263 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425
: job id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd from READY to RUNNING
79925 2018-02-01 23:54:16,271 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111
: Executing AbstractExecutable (Extract Fact Table Distinct Columns)
79926 2018-02-01 23:54:16,275 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425
: job id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-02 from READY to RUNNING
79927 2018-02-01 23:54:16,358 INFO  [pool-9-thread-1] threadpool.DefaultScheduler:123 : Job
Fetcher: 0 should running, 1 actual running, 0 stopped, 1 ready, 86 already succeed, 47 error,
0       discarded, 0 others
79928 2018-02-01 23:54:16,371 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.MapReduceExecutable:115
: parameters of the MapReduceExecutable:  -conf /apps/rft/rcmo/apps/kylin/k      ylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/conf/kylin_job_conf.xml
-cubename Deposits -output hdfs://sfpdev/tenants/rft/rcmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-2b
     8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns -segmentid da273eda-45ea-4c72-816c-709c8a61df16
-statisticsenabled true -statisticsoutput hdfs://sfpdev/tenants/rft/r      cmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns/statistics
-statisticssamplingpercent 100 -jobname Kylin_Fact_D      istinct_Columns_Deposits_Step -cubingJobId
2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd
79929 2018-02-01 23:54:16,424 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] steps.FactDistinctColumnsJob:106
: Starting: Kylin_Fact_Distinct_Columns_Deposits_Step
79930 2018-02-01 23:54:16,775 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:386
: Trying to connect to metastore with URI thrift://bdtpisr3n1.svr.us.jpmchase.net:9083<http://bdtpisr3n1.svr.us.jpmchase.net:9083>
79931 2018-02-01 23:54:16,784 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:431
: Opened a connection to metastore, current connections: 3
79932 2018-02-01 23:54:16,784 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:483
: Connected to metastore.
79933 2018-02-01 23:54:17,345 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.KylinConfigBase:162
: Kylin Config was updated with kylin.metadata.url : /apps/rft/rcmo/apps/kylin/      kylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta
79934 2018-02-01 23:54:17,347 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] persistence.ResourceStore:79
: Using metadata url /apps/rft/rcmo/apps/kylin/kylin_namespace/apache-kylin-2      .1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta
for resource store
79935 2018-02-01 23:54:17,354 DEBUG [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.AbstractHadoopJob:547
: Dump resources to /apps/rft/rcmo/apps/kylin/kylin_namespace/apache-kylin-2.      1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta
took 9 ms
79936 2018-02-01 23:54:17,354 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.AbstractHadoopJob:505
: HDFS meta dir is: file:///apps/rft/rcmo/apps/kylin/kylin_namespace/apache-k<file:///\\apps\rft\rcmo\apps\kylin\kylin_namespace\apache-k>
     ylin-2.1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta
79937 2018-02-01 23:54:17,470 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hdfs.DFSClient:1086
: Created token for a_rcmo_nd: HDFS_DELEGATION_TOKEN owner=a_rcmo_nd@NAEAST.AD.JPMORGA<mailto:owner=a_rcmo_nd@NAEAST.AD.JPMORGA>
     NCHASE.COM<https://secureweb.jpmchase.net/readonly/http:/NCHASE.COM>, renewer=yarn,
realUser=, issueDate=1517547257468, maxDate=1518152057468, sequenceNumber=917925, masterKeyId=921
on ha-hdfs:sfpdev
79938 2018-02-01 23:54:17,471 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] security.TokenCache:144
: Got dt for hdfs://sfpdev; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:sfpdev,       Ident:
(token for a_rcmo_nd: HDFS_DELEGATION_TOKEN owner=a_rcmo_nd@NAEAST.AD.JPMORGANCHASE.COM<mailto:owner=a_rcmo_nd@NAEAST.AD.JPMORGANCHASE.COM>,
renewer=yarn, realUser=, issueDate=1517547257468, maxDate=1518152057468, sequenceNumber  
   =917925, masterKeyId=921)
79939 2018-02-01 23:54:17,478 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] client.ConfiguredRMFailoverProxyProvider:100
: Failing over to rm76
79940 2018-02-01 23:54:18,864 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapred.FileInputFormat:249
: Total input paths to process : 482
79941 2018-02-01 23:54:19,518 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapreduce.JobSubmitter:202
: number of splits:482
79942 2018-02-01 23:54:19,566 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapreduce.JobSubmitter:291
: Submitting tokens for job: job_1516848187601_12793
79943 2018-02-01 23:54:19,566 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapreduce.JobSubmitter:293
: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:sfpdev, Ident: (token for a_rcm      o_nd:
HDFS_DELEGATION_TOKEN owner=a_rcmo_nd@NAEAST.AD.JPMORGANCHASE.COM<mailto:owner=a_rcmo_nd@NAEAST.AD.JPMORGANCHASE.COM>,
renewer=yarn, realUser=, issueDate=1517547257468, maxDate=1518152057468, sequenceNumber=917925,
masterKeyId=92      1)
79944 2018-02-01 23:54:19,821 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] impl.YarnClientImpl:260
: Submitted application application_1516848187601_12793
79945 2018-02-01 23:54:19,825 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapreduce.Job:1311
: The url to track the job: http://bdtpisr3n2.svr.us.jpmchase.net:8088/proxy/applicatio



Also pls. advise on Spark parameter as well.

147 kylin.engine.mr<https://secureweb.jpmchase.net/readonly/http:/kylin.engine.mr>.reduce-input-mb=400
149 #kylin.engine.mr<https://secureweb.jpmchase.net/readonly/http:/kylin.engine.mr>.max-reducer-number=300
151 kylin.engine.mr<https://secureweb.jpmchase.net/readonly/http:/kylin.engine.mr>.mapper-input-rows=500000
154 #kylin.engine.mr<https://secureweb.jpmchase.net/readonly/http:/kylin.engine.mr>.build-dict-in-reducer=true
157 kylin.engine.mr<https://secureweb.jpmchase.net/readonly/http:/kylin.engine.mr>.uhc-reducer-count=2
159 #### CUBE | DICTIONARY ###
164 kylin.cube.algorithm=inmem
166 ## A smaller threshold prefers layer, a larger threshold prefers in-mem
167 #kylin.cube.algorithm.layer-or-inmem-threshold=7
169 kylin.cube.aggrgroup.max-combination=61440
171 kylin.snapshot.max-mb=1500



kylin.engine.spark.rdd-partition-cut-mb=800
229 kylin.engine.spark.min-partition=1
231 ## Max partition numbers of rdd
232 kylin.engine.spark.max-partition=500
237 kylin.engine.spark-conf.spark.yarn.queue=XXXX
238 kylin.engine.spark-conf.spark.executor.memory=8G
239 kylin.engine.spark-conf.spark.executor.cores=6
240 kylin.engine.spark-conf.spark.executor.instances=10
241 kylin.engine.spark-conf.spark.eventLog.enabled=true
242 kylin.engine.spark-conf.spark.eventLog.dir=XXXX
243 kylin.engine.spark-conf.spark.history.fs.logDirectory=XXXX
244 kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false

Regards,
Manoj


This message is confidential and subject to terms at: http://www.jpmorgan.com/emaildisclaimer<http://www.jpmorgan.com/emaildisclaimer>
including on confidentiality, legal privilege, viruses and monitoring of electronic messages.
If you are not the intended recipient, please delete this message and notify the sender immediately.
Any unauthorized use is strictly prohibited.



--
Best regards,

Shaofeng Shi 史少锋


This message is confidential and subject to terms at: http://www.jpmorgan.com/emaildisclaimer
including on confidentiality, legal privilege, viruses and monitoring of electronic messages.
If you are not the intended recipient, please delete this message and notify the sender immediately.
Any unauthorized use is strictly prohibited.
Mime
View raw message