kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kang-Sen Lu <...@anovadata.com>
Subject RE: 回复:RE: 回复:RE: anybody used spark to build cube in kylin 2.5.1?
Date Mon, 10 Dec 2018 20:12:34 GMT
Hi, Chao:

Did you set “kylin.source.hive.flat-table-storage-format” as SEQUENCEFILE or other value? If I set it to SEQUENCEFILE, the cube build will be OK. I will try your suggestion and see if it works.

I am seeing another problem for spark cube build. At step 3, I saw some executor failed. I am wondering how to find out the root cause. Here is the log from stderr:

018-12-10 18:44:39,776 INFO scheduler.TaskSetManager: Finished task 28.0 in stage 0.0 (TID 21) in 46644 ms on hadoop3 (executor 34) (28/31)
2018-12-10 18:44:39,783 INFO yarn.YarnAllocator: Driver requested a total number of 3 executor(s).
2018-12-10 18:46:02,679 INFO scheduler.TaskSetManager: Finished task 21.0 in stage 0.0 (TID 19) in 129550 ms on hadoop5 (executor 18) (29/31)
2018-12-10 18:46:02,778 INFO yarn.YarnAllocator: Driver requested a total number of 2 executor(s).
2018-12-10 18:53:55,116 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 21.
2018-12-10 18:53:55,125 INFO scheduler.DAGScheduler: Executor lost: 21 (epoch 0)
2018-12-10 18:53:55,126 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 21 from BlockManagerMaster.
2018-12-10 18:53:55,128 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(21, hadoop2, 40455, None)
2018-12-10 18:53:55,129 INFO storage.BlockManagerMaster: Removed 21 successfully in removeExecutor
2018-12-10 18:53:55,330 INFO yarn.YarnAllocator: Completed container container_1544204485929_0069_01_000022 on host: hadoop2 (state: COMPLETE, exit status: 143)
2018-12-10 18:53:55,333 WARN yarn.YarnAllocator: Container marked as failed: container_1544204485929_0069_01_000022 on host: hadoop2. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

2018-12-10 18:53:55,338 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1544204485929_0069_01_000022 on host: hadoop2. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

2018-12-10 18:53:55,342 ERROR cluster.YarnClusterScheduler: Lost executor 21 on hadoop2: Container marked as failed: container_1544204485929_0069_01_000022 on host: hadoop2. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

2018-12-10 18:53:55,348 WARN scheduler.TaskSetManager: Lost task 29.0 in stage 0.0 (TID 26, hadoop2, executor 21): ExecutorLostFailure (executor 21 exited caused by one of the running tasks) Reason: Container marked as failed: container_1544204485929_0069_01_000022 on host: hadoop2. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

2018-12-10 18:53:55,351 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 21 from BlockManagerMaster.
2018-12-10 18:53:55,351 INFO storage.BlockManagerMaster: Removal of executor 21 requested
2018-12-10 18:53:55,352 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asked to remove non-existent executor 21
2018-12-10 18:53:55,360 INFO spark.ExecutorAllocationManager: Existing executor 21 has been removed (new total is 39)

Thanks.

Kang-sen

From: Chao Long <wayne.l@qq.com>
Sent: Monday, December 10, 2018 11:15 AM
To: user <user@kylin.apache.org>
Subject: 回复:RE: 回复:RE: anybody used spark to build cube in kylin 2.5.1?

Hi Kang-Sen,
    In my environment hdp-2.4.0.0-169, hive-1.2.1000.2.4.0.0-169), copy /usr/hdp/2.4.0.0-169/hive/conf/hive-site.xml to $KYLIN_HOME/spark/conf can fix the problem "table not found in database". So, I think this could be an environment issue.

   You can try to run the spark execution cmd(add --file /xx/xx/hive-site.xml) manually in the cli and see if get the same error.

------------------
Best Regards,
Chao Long

------------------ 原始邮件 ------------------
发件人: "Kang-Sen Lu"<klu@anovadata.com<mailto:klu@anovadata.com>>;
发送时间: 2018年12月10日(星期一) 晚上8:58
收件人: "user@kylin.apache.org<mailto:user@kylin.apache.org>"<user@kylin.apache.org<mailto:user@kylin.apache.org>>;
主题: RE: 回复:RE: anybody used spark to build cube in kylin 2.5.1?

Hi, Chao: (I hope I got your first name correctly.)

Thanks for the reply. I have recognized that KYLIN-3699 was opened to address this problem.

I believe there is no bug opened to address the problem that only SEQUENCEFILE is supported for spark cube build. Right?

Kang-sen

From: Chao Long <wayne.l@qq.com<mailto:wayne.l@qq.com>>
Sent: Sunday, December 09, 2018 11:50 AM
To: user <user@kylin.apache.org<mailto:user@kylin.apache.org>>
Subject: 回复:RE: anybody used spark to build cube in kylin 2.5.1?

Hi KangSen,
   There is a known jira issue about Spark cubing failed at step7 with no input data.
   https://issues.apache.org/jira/browse/KYLIN-3699

------------------
Best Regards,
Chao Long
------------------ 原始邮件 ------------------
发件人: "Kang-Sen Lu"<klu@anovadata.com<mailto:klu@anovadata.com>>;
发送时间: 2018年12月8日(星期六) 凌晨5:32
收件人: "user@kylin.apache.org<mailto:user@kylin.apache.org>"<user@kylin.apache.org<mailto:user@kylin.apache.org>>;
主题: RE: anybody used spark to build cube in kylin 2.5.1?

I am able to build cube with spark. I am using kylin 2.5.1. Hive 1.2.1000.2.5.6.0-40.

I need to set “kylin.source.hive.flat-table-storage-format=SEQUENCEFILE” in kylin.properties.

In addition, if I build a cube at the time that there were no input data, the cube build will fail at step 7. Otherwise, it would work OK.

Thanks.

Kang-sen


From: Kang-Sen Lu <klu@anovadata.com<mailto:klu@anovadata.com>>
Sent: Friday, December 07, 2018 11:35 AM
To: user@kylin.apache.org<mailto:user@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?

The spark cube build does not have correct support for non-SEQUENCEFILE.

In my kylin.properties, I changed from:
kylin.source.hive.flat-table-storage-format=TEXTFILE
to:
kylin.source.hive.flat-table-storage-format=SEQUENCEFILE

Then restarted kylin.
The spark cube build passed step3 and failed at step 7:
#7 Step Name: Build Cube with Spark
Duration: 1.45 mins Waiting: 0 seconds

The error is the same as reported by KYLIN-3699.
https://issues.apache.org/jira/browse/KYLIN-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711042#comment-16711042

Thanks.

Kang-sen

From: Kang-Sen Lu <klu@anovadata.com<mailto:klu@anovadata.com>>
Sent: Thursday, December 06, 2018 2:11 PM
To: user@kylin.apache.org<mailto:user@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?

Hi, Shaofeng:

I compared the spark execution cmd logged in my kylin.log file vs. the one included in the kylin doc, “build cube with spark”, I can see that mine cmd is missing this option:


“--files /etc/hbase/2.4.0.0-169/0/hbase-site.xml”.



Here is my cmd:



2018-12-06 11:50:02,665 INFO  [Scheduler 1026601642 Job 2d710968-60d4-bacb-a7d7-c63ac42e92f0-328] spark.SparkExecutable:261 : cmd: export HADOOP_CONF_DIR=/usr/hdp/2.5.6.0-40/hadoop/conf && /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry  --conf spark.executor.cores=1  --conf spark.hadoop.yarn.timeline-service.enabled=false  --conf spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec  --conf spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40  --conf spark.master=yarn  --conf spark.hadoop.mapreduce.output.fileoutputformat.compress=true  --conf spark.executor.instances=40  --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40  --conf spark.executor.memory=4G  --conf spark.yarn.queue=default  --conf spark.submit.deployMode=cluster  --conf spark.dynamicAllocation.minExecutors=1  --conf spark.network.timeout=600  --conf spark.hadoop.dfs.replication=2  --conf spark.yarn.executor.memoryOverhead=1024  --conf spark.dynamicAllocation.executorIdleTimeout=300  --conf spark.history.fs.logDirectory=hdfs:///user/zettics/kylin/spark-history  --conf spark.driver.memory=2G  --conf spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40  --conf spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec  --conf spark.eventLog.enabled=true  --conf spark.shuffle.service.enabled=true  --conf spark.eventLog.dir=hdfs:///user/zettics/kylin/spark-eventLog  --conf spark.yarn.archive=hdfs:///user/zettics/spark/spark-libs.jar  --conf spark.dynamicAllocation.maxExecutors=1000  --conf spark.dynamicAllocation.enabled=true --jars /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar /home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -className org.apache.kylin.engine.spark.SparkFactDistinct -counterOutput hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/counter -statisticssamplingpercent 100 -cubename ma_aggs_topn_cube -hiveTable zetticsdw.kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -output hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/fact_distinct_columns -input hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/kylin_intermediate_ma_aggs_topn_cube_5d462857_8665_d5e8_a3a5_da9b1d461344 -segmentId 5d462857-8665-d5e8-a3a5-da9b1d461344 -metaUrl anova_kylin_25x_metadata@hdfs,path=hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/metadata<mailto:anova_kylin_25x_metadata@hdfs,path=hdfs://hadoop1.zettics.com:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-2d710968-60d4-bacb-a7d7-c63ac42e92f0/ma_aggs_topn_cube/metadata>



And this is the cmd on kylin doc:


2017-03-06 14:44:38,574 INFO  [Job 2d5c1178-c6f6-4b50-8937-8e5e3b39227e-306] spark.SparkExecutable:121 : cmd:export HADOOP_CONF_DIR=/etc/hadoop/conf && /usr/local/apache-kylin-2.4.0-bin-hbase1x/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry  --conf spark.executor.instances=1  --conf spark.yarn.queue=default  --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=current  --conf spark.history.fs.logDirectory=hdfs:///kylin/spark-history  --conf spark.driver.extraJavaOptions=-Dhdp.version=current --conf spark.master=yarn  --conf spark.executor.extraJavaOptions=-Dhdp.version=current  --conf spark.executor.memory=1G  --conf spark.eventLog.enabled=true  --conf spark.eventLog.dir=hdfs:///kylin/spark-history  --conf spark.executor.cores=2  --conf spark.submit.deployMode=cluster --files /etc/hbase/2.4.0.0-169/0/hbase-site.xml /usr/local/apache-kylin-2.4.0-bin-hbase1x/lib/kylin-job-2.4.0.jar -className org.apache.kylin.engine.spark.SparkCubingByLayer -hiveTable kylin_intermediate_kylin_sales_cube_555c4d32_40bb_457d_909a_1bb017bf2d9e -segmentId 555c4d32-40bb-457d-909a-1bb017bf2d9e -confPath /usr/local/apache-kylin-2.4.0-bin-hbase1x/conf -output hdfs:///kylin/kylin_metadata/kylin-2d5c1178-c6f6-4b50-8937-8e5e3b39227e/kylin_sales_cube/cuboid/ -cubename kylin_sales_cube



My question is “what config parameter can cause this difference”?



Thanks.



Kang-sen




From: Kang-Sen Lu <klu@anovadata.com<mailto:klu@anovadata.com>>
Sent: Wednesday, December 05, 2018 4:59 PM
To: user@kylin.apache.org<mailto:user@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?

Hi, Shaofeng:

I have copied hive-site.xml into …/spark/conf directory and set the hive.metastore.uris, and hive.metastore.warehouse.dir based on my ambari’s hive config data.

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:postgresql://anovadata6.anovadata.local:5432/hive;create=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>org.apache.derby.jdbc.EmbeddedDriver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>

<property>
  <name>hive.hwi.war.file</name>
  <value>/usr/lib/hive/lib/hive-hwi-.war</value>
  <description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://anovadata6.anovadata.local:9083</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/apps/hive/warehouse</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

But in spark run stderr, I still see that spark thinks the metastore is DERBY:

18/12/05 16:33:37 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY

Does that mean somehow, the cube building spark does not pick up hive-site.xml from …/spark/conf dir?

Kang-sen

From: Kang-Sen Lu <klu@anovadata.com<mailto:klu@anovadata.com>>
Sent: Wednesday, December 05, 2018 9:32 AM
To: user@kylin.apache.org<mailto:user@kylin.apache.org>
Subject: RE: anybody used spark to build cube in kylin 2.5.1?

Hi, Shaofeng:

I am not sure about how to allow spark gain access to the hive table which was build by kylin.

I did search internet about spark and hive integration, but I failed to find out a concrete example.

Anyway, I updated my kylin/spark/conf/hive-site.xml,

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:hive2://anovadata6.anovadata.local:10000/zetticsdw;create=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

And restarted kylin. But I still get the following erroe:

18/12/05 08:32:51 INFO spark.SparkFactDistinct: RDD Output path: hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/fact_distinct_columns
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getTotalReducerNum: 5
18/12/05 08:32:51 INFO spark.SparkFactDistinct: getCuboidRowCounterReducerNum: 1
18/12/05 08:32:51 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-53610913-3f12-f20f-2fb8-22c6a69f8dcc/ma_aggs_topn_cube_test/counter
18/12/05 08:32:51 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/12/05 08:32:51 INFO internal.SharedState: Warehouse path is 'file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse'.
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6d8fe3d4{/SQL,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@6d8fe3d4%7b/SQL,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7fb4de12{/SQL/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@7fb4de12%7b/SQL/json,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32135371{/SQL/execution,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@32135371%7b/SQL/execution,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c25612d{/SQL/execution/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@5c25612d%7b/SQL/execution/json,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73a7b4d0{/static/sql,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@73a7b4d0%7b/static/sql,null,AVAILABLE,@Spark%7d>
18/12/05 08:32:51 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/12/05 08:32:52 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/12/05 08:32:52 INFO metastore.ObjectStore: ObjectStore, initialize called
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/12/05 08:32:52 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/12/05 08:32:54 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:55 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:56 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/12/05 08:32:56 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/05 08:32:56 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/05 08:32:57 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_all_databases
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_functions: db=default pat=*
18/12/05 08:32:57 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/ca792b2d-e6c4-4a5d-b87a-cbce337612aa_resources
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/tmp/yarn/ca792b2d-e6c4-4a5d-b87a-cbce337612aa
18/12/05 08:32:57 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/ca792b2d-e6c4-4a5d-b87a-cbce337612aa/_tmp_space.db
18/12/05 08:32:57 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0099/container_e05_1543422353836_0099_01_000001/spark-warehouse
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: default
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: global_temp
18/12/05 08:32:57 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/05 08:32:57 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/05 08:32:57 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)

My question is why spark is not able to find the hive metastore location?

If you have any pointer which shows a complete example of hive-site.xml for spark + hive application, I am greatly appreciated.

Kang-sen

From: ShaoFeng Shi <shaofengshi@apache.org<mailto:shaofengshi@apache.org>>
Sent: Monday, December 03, 2018 7:53 PM
To: user <user@kylin.apache.org<mailto:user@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?

Just double check it; The error message is clear, and do some search with Spark + Hive.

If possible, we suggest using the sequence file (default config) for the intermediate hive table.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<mailto:shaofeng.shi@kyligence.io>
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<mailto:user-subscribe@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<mailto:dev-subscribe@kylin.apache.org>




Kang-Sen Lu <klu@anovadata.com<mailto:klu@anovadata.com>> 于2018年12月3日周一 下午9:33写道:
Hi, Shaofeng:

Thanks for the reply.

This is a line in my kylin.properties:

kylin.source.hive.flat-table-storage-format=TEXTFILE

I copied hive-site.xml into spark/conf and try to resume the cube rebuild.
(cp /etc/hive2/2.5.6.0-40/0/hive-site.xml spark/conf)

The cube-build still failed, the stderr log is as follows:

18/12/03 08:27:02 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/03 08:27:02 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/03 08:27:02 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/03 08:27:02 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_all_databases
18/12/03 08:27:02 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/03 08:27:02 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_functions: db=default pat=*
18/12/03 08:27:02 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/1dc9dffd-a306-4929-a387-833486436fb8_resources
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created local directory: /data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/tmp/yarn/1dc9dffd-a306-4929-a387-833486436fb8
18/12/03 08:27:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/1dc9dffd-a306-4929-a387-833486436fb8/_tmp_space.db
18/12/03 08:27:03 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/data1/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0098/container_e05_1543422353836_0098_02_000001/spark-warehouse
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: default
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: default
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: global_temp
18/12/03 08:27:03 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/12/03 08:27:03 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef
18/12/03 08:27:03 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
        at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';
        at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
        at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
        at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
        at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
        at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
        at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
        at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
        at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
        at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
        at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
        at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
        ... 6 more
18/12/03 08:27:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_968d6f19_8f2b_38d7_69a2_bea7278058ef' not found in database 'zetticsdw';)
18/12/03 08:27:03 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/12/03 08:27:03 INFO server.ServerConnector: Stopped Spark@56c07a8e{HTTP/1.1}{0.0.0.0:0<http://0.0.0.0:0>}



From: ShaoFeng Shi <shaofengshi@apache.org<mailto:shaofengshi@apache.org>>
Sent: Sunday, December 02, 2018 2:04 AM
To: user <user@kylin.apache.org<mailto:user@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?

Hi Kang-sen,

When the intermediate table's file format is not sequence file, Kylin will use Hive catalog to parse the data into RDD. In this case, it needs the "hive-site.xml" in spark/conf folder. Please confirm whether it is this case, if true, put the file and then try again.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<mailto:shaofeng.shi@kyligence.io>
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<mailto:user-subscribe@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<mailto:dev-subscribe@kylin.apache.org>




Kang-Sen Lu <klu@anovadata.com<mailto:klu@anovadata.com>> 于2018年12月1日周六 上午12:30写道:
Hi, SHaofeng:

Your suggestion made some progress. Now the step3 of cube build go further and showed another problem. Here is the stderr log:

18/11/30 11:14:20 INFO spark.SparkFactDistinct: counter path hdfs://anovadata6.anovadata.local:8020/user/zettics/kylin/25x/anova_kylin_25x_metadata/kylin-26646d80-3923-8ce4-1972-d24d197bcef7/ma_aggs_topn_cube_test/counter
18/11/30 11:14:20 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/11/30 11:14:20 INFO internal.SharedState: Warehouse path is 'file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse'.
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68871c45{/SQL,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@68871c45%7b/SQL,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3071483{/SQL/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@3071483%7b/SQL/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53f1ff78{/SQL/execution,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@53f1ff78%7b/SQL/execution,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@740d6f25{/SQL/execution/json,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@740d6f25%7b/SQL/execution/json,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e2af876{/static/sql,null,AVAILABLE,@Spark}<mailto:o.s.j.s.ServletContextHandler@6e2af876%7b/static/sql,null,AVAILABLE,@Spark%7d>
18/11/30 11:14:20 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/11/30 11:14:21 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/11/30 11:14:21 INFO metastore.ObjectStore: ObjectStore, initialize called
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/11/30 11:14:21 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/11/30 11:14:22 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:24 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/11/30 11:14:25 INFO metastore.ObjectStore: Initialized ObjectStore
18/11/30 11:14:25 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added admin role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: Added public role in metastore
18/11/30 11:14:25 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_all_databases
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_all_databases
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_functions: db=default pat=*
18/11/30 11:14:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/0cd659c1-1104-4364-a9fb-878539d9208c_resources
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created local directory: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/tmp/yarn/0cd659c1-1104-4364-a9fb-878539d9208c
18/11/30 11:14:25 INFO session.SessionState: Created HDFS directory: /tmp/hive/zettics/0cd659c1-1104-4364-a9fb-878539d9208c/_tmp_space.db
18/11/30 11:14:25 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0093/container_e05_1543422353836_0093_02_000001/spark-warehouse
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: default
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: default
18/11/30 11:14:25 INFO metastore.HiveMetaStore: 0: get_database: global_temp
18/11/30 11:14:25 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_database: global_temp
18/11/30 11:14:25 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
18/11/30 11:14:25 INFO execution.SparkSqlParser: Parsing command: zetticsdw.kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO metastore.HiveMetaStore: 0: get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 INFO HiveMetaStore.audit: ugi=zettics      ip=unknown-ip-addr        cmd=get_table : db=zetticsdw tbl=kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69
18/11/30 11:14:26 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
        at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:636)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';
        at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
        at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74)
        at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
        at org.apache.spark.sql.hive.HiveExternalCatalog.org<http://org.apache.spark.sql.hive.HiveExternalCatalog.org>$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
        at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627)
        at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124)
        at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70)
        at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586)
        at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582)
        at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157)
        at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186)
        at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
        ... 6 more
18/11/30 11:14:26 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_ma_aggs_topn_cube_test_c870139e_7a00_f5e8_4c6c_bad29be12b69' not found in database 'zetticsdw';)
18/11/30 11:14:26 INFO spark.SparkContext: Invoking stop() from shutdown hook

Kang-sen

From: ShaoFeng Shi <shaofengshi@apache.org<mailto:shaofengshi@apache.org>>
Sent: Friday, November 30, 2018 8:53 AM
To: user <user@kylin.apache.org<mailto:user@kylin.apache.org>>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?

A solution is to put a "java-opts" file in spark/conf folder, adding the 'hdp.version' configuration, like this:

cat /usr/local/spark/conf/java-opts
-Dhdp.version=2.4.0.0-169


Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
<mailto:shaofeng.shi@kyligence.io>
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org<mailto:user-subscribe@kylin.apache.org>
Join Kylin dev mail group: dev-subscribe@kylin.apache.org<mailto:dev-subscribe@kylin.apache.org>




Kang-Sen Lu <klu@anovadata.com<mailto:klu@anovadata.com>> 于2018年11月30日周五 下午9:04写道:
Thanks for the reply from Yichen and Aron. This is my kylin.properties:

kylin.engine.spark-conf.spark.yarn.archive=hdfs://192.168.230.199:8020/user/zettics/spark/spark-libs.jar<http://192.168.230.199:8020/user/zettics/spark/spark-libs.jar>
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=2.5.6.0-40
kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=2.5.6.0-40

But I still get the same error.

Stack trace: ExitCodeException exitCode=1: /data5/hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0091/container_e05_1543422353836_0091_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution

                at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
                at org.apache.hadoop.util.Shell.run(Shell.java:848)
                at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
                at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)

I also saw in stderr:

Log Type: stderr
Log Upload Time: Fri Nov 30 07:54:45 -0500 2018
Log Length: 88
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster

I suspect my problem is related to the fact that “${hdp.version}” was not resolved somehow. It seems that kylin.properties parameters like “extraJavaOptions=-Dhdp.version=2.5.6.0-40” was not enough.

Kang-sen





From: Yichen Zhou <zhouycsf@gmail.com<mailto:zhouycsf@gmail.com>>
Sent: Thursday, November 29, 2018 9:08 PM
To: user@kylin.apache.org<mailto:user@kylin.apache.org>
Subject: Re: anybody used spark to build cube in kylin 2.5.1?

Hi Kang-Sen,

I think Jiatao is right. If you want to use spark to build cube in HDP cluster, you need to config -Dhdp.version in $KYLIN_HOME/conf/kylin.properties.

## uncomment for HDP

#kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current

#kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current

#kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
Please refer to this: http://kylin.apache.org/docs/tutorial/cube_spark.html

Regards,
Yichen


JiaTao Tao <taojiatao@gmail.com<mailto:taojiatao@gmail.com>> 于2018年11月30日周五 上午9:57写道:
Hi

I took a look at the Internet and found these links, take a try and hope it helps.

https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html

https://stackoverflow.com/questions/32341709/bad-substitution-when-submitting-spark-job-to-yarn-cluster

--

Regards!
Aron Tao


Kang-Sen Lu <klu@anovadata.com<mailto:klu@anovadata.com>> 于2018年11月29日周四 下午3:11写道:
We are running kylin 2.5.1. For a specific cube created, the cube build for one hour of data took 200 minutes. So I am thinking about building cube with spark, instead of map-reduce.

I selected spark in the cube design, advanced setting.

The cube build failed at step 3, with the following error log:

OS command error exit with return code: 1, error message: 18/11/29 09:50:33 INFO client.RMProxy: Connecting to ResourceManager at anovadata6.anovadata.local/192.168.230.199:8050<http://192.168.230.199:8050>
18/11/29 09:50:33 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
18/11/29 09:50:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (191488 MB per container)
18/11/29 09:50:33 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
18/11/29 09:50:33 INFO yarn.Client: Setting up container launch context for our AM
18/11/29 09:50:33 INFO yarn.Client: Setting up the launch environment for our AM container
18/11/29 09:50:33 INFO yarn.Client: Preparing resources for our AM container
18/11/29 09:50:35 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/11/29 09:50:38 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_libs__6261254232609828730.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_libs__6261254232609828730.zip
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/kylin-job-2.5.1-anovadata.jar
18/11/29 09:50:39 WARN yarn.Client: Same path resource file:/home/zettics/kylin/apache-kylin-2.5.1-anovadata-bin/lib/kylin-job-2.5.1-anovadata.jar added multiple times to distributed cache.
18/11/29 09:50:39 INFO yarn.Client: Uploading resource file:/tmp/spark-507691d4-f131-4bc5-bf6c-c8ff7606e201/__spark_conf__1525388499029792228.zip -> hdfs://anovadata6.anovadata.local:8020/user/zettics/.sparkStaging/application_1543422353836_0088/__spark_conf__.zip
18/11/29 09:50:39 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls to: zettics
18/11/29 09:50:39 INFO spark.SecurityManager: Changing view acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: Changing modify acls groups to:
18/11/29 09:50:39 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(zettics); groups with view permissions: Set(); users  with modify permissions: Set(zettics); groups with modify permissions: Set()
18/11/29 09:50:39 INFO yarn.Client: Submitting application application_1543422353836_0088 to ResourceManager
18/11/29 09:50:39 INFO impl.YarnClientImpl: Submitted application application_1543422353836_0088
18/11/29 09:50:40 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:40 INFO yarn.Client:
         client token: N/A
        diagnostics: AM container is launched, waiting for AM container to Register with RM
        ApplicationMaster host: N/A
        ApplicationMaster RPC port: -1
        queue: default
        start time: 1543503039903
        final status: UNDEFINED
        tracking URL: http://anovadata6.anovadata.local:8088/proxy/application_1543422353836_0088/
        user: zettics
18/11/29 09:50:41 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:42 INFO yarn.Client: Application report for application_1543422353836_0088 (state: ACCEPTED)
18/11/29 09:50:43 INFO yarn.Client: Application report for application_1543422353836_0088 (state: FAILED)
18/11/29 09:50:43 INFO yarn.Client:
         client token: N/A
        diagnostics: Application application_1543422353836_0088 failed 2 times due to AM Container for appattempt_1543422353836_0088_000002 exited with  exitCode: 1
For more detailed output, check the application tracking page: http://anovadata6.anovadata.local:8088/cluster/app/application_1543422353836_0088 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e05_1543422353836_0088_02_000001
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution

Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/zettics/appcache/application_1543422353836_0088/container_e05_1543422353836_0088_02_000001/launch_container.sh: line 26: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
        at org.apache.hadoop.util.Shell.run(Shell.java:848)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)


Thanks.

Kang-sen


Mime
View raw message