kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "奥威软件" <3513797...@qq.com>
Subject hadoop3版本,kylin2.6.1,spark构建cube,第三步,spark状态一直是state running,但一直是10%的进度,没有任何错误信息
Date Fri, 08 Mar 2019 11:42:35 GMT
Hi,
环境:
hadoop3.1.2
kylin:2.6.1


卡在第三步(#3 step : Extract Fact Table Distinct Columns)







kylin.log 上没有报错信息~,
以下是执行信息:
2019-03-08 11:15:08,660 INFO  [Scheduler 190936888 Job edba142b-b9ec-2379-89f6-2a34511d0425-177]
spark.SparkExecutable:262 : cmd: export HADOOP_CONF_DIR=/home/apache/hbase/conf &&
/home/apache/kylin/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry
 --conf spark.executor.instances=40  --conf spark.yarn.archive=hdfs://master:9820/kylin/spark/spark-libs.jar
 --conf spark.network.timeout=600  --conf spark.yarn.queue=default  --conf spark.history.fs.logDirectory=hdfs:///kylin/spark-history
 --conf spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec  --conf spark.dynamicAllocation.enabled=true
 --conf spark.master=yarn  --conf spark.storage.memoryFraction=0.5  --conf spark.dynamicAllocation.executorIdleTimeout=300
 --conf spark.hadoop.yarn.timeline-service.enabled=false  --conf spark.executor.memory=4G
 --conf spark.eventLog.enabled=true  --conf spark.eventLog.dir=hdfs:///kylin/spark-history
 --conf spark.dynamicAllocation.minExecutors=1  --conf spark.executor.cores=2  --conf spark.yarn.executor.memoryOverhead=1024
 --conf spark.hadoop.dfs.replication=2  --conf spark.dynamicAllocation.maxExecutors=1000 
--conf spark.executor.memoryOverhead=1024  --conf spark.driver.memory=2G  --conf spark.driver.memoryOverhead=256
 --conf spark.submit.deployMode=cluster  --conf spark.shuffle.service.enabled=true --jars
/home/apache/kylin/lib/kylin-job-2.6.1.jar /home/apache/kylin/lib/kylin-job-2.6.1.jar -className
org.apache.kylin.engine.spark.SparkFactDistinct -counterOutput hdfs://kylincluster/kylin/kylin_metadata/kylin-edba142b-b9ec-2379-89f6-2a34511d0425/spark_cube_low/counter
-statisticssamplingpercent 100 -cubename spark_cube_low -hiveTable default.kylin_intermediate_spark_cube_low_63876e8b_97c8_074d_0356_3f779a6ca1b6
-output hdfs://kylincluster/kylin/kylin_metadata/kylin-edba142b-b9ec-2379-89f6-2a34511d0425/spark_cube_low/fact_distinct_columns
-input hdfs://kylincluster/kylin/kylin_metadata/kylin-edba142b-b9ec-2379-89f6-2a34511d0425/kylin_intermediate_spark_cube_low_63876e8b_97c8_074d_0356_3f779a6ca1b6
-segmentId 63876e8b-97c8-074d-0356-3f779a6ca1b6 -metaUrl kylin_metadata@hdfs,path=hdfs://kylincluster/kylin/kylin_metadata/kylin-edba142b-b9ec-2379-89f6-2a34511d0425/spark_cube_low/metadata
2019-03-08 11:15:09,423 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:09
WARN  SparkConf:66 - The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated
as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead'
instead.
2019-03-08 11:15:10,478 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:10
INFO  Client:54 - Requesting a new application from cluster with 2 NodeManagers
2019-03-08 11:15:10,541 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:10
INFO  Client:54 - Verifying our application has not requested more than the maximum memory
capability of the cluster (8000 MB per container)
2019-03-08 11:15:10,542 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:10
INFO  Client:54 - Will allocate AM container, with 2304 MB memory including 256 MB overhead
2019-03-08 11:15:10,542 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:10
INFO  Client:54 - Setting up container launch context for our AM
2019-03-08 11:15:10,544 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:10
INFO  Client:54 - Setting up the launch environment for our AM container
2019-03-08 11:15:10,552 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:10
INFO  Client:54 - Preparing resources for our AM container
2019-03-08 11:15:11,553 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:11
INFO  Client:54 - Uploading resource hdfs://master:9820/kylin/spark/spark-libs.jar -> hdfs://kylincluster/user/root/.sparkStaging/application_1552042715496_0010/spark-libs.jar
2019-03-08 11:15:13,978 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:13
INFO  Client:54 - Uploading resource file:/home/apache/kylin/lib/kylin-job-2.6.1.jar ->
hdfs://kylincluster/user/root/.sparkStaging/application_1552042715496_0010/kylin-job-2.6.1.jar
2019-03-08 11:15:14,485 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:14
WARN  Client:66 - Same name resource file:///home/apache/kylin/lib/kylin-job-2.6.1.jar added
multiple times to distributed cache
2019-03-08 11:15:14,594 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:14
INFO  Client:54 - Uploading resource file:/tmp/spark-1e734412-567c-4c59-a6c6-9c58c83355db/__spark_conf__6590808330343993974.zip
-> hdfs://kylincluster/user/root/.sparkStaging/application_1552042715496_0010/__spark_conf__.zip
2019-03-08 11:15:14,796 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:14
INFO  SecurityManager:54 - Changing view acls to: root
2019-03-08 11:15:14,797 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:14
INFO  SecurityManager:54 - Changing modify acls to: root
2019-03-08 11:15:14,798 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:14
INFO  SecurityManager:54 - Changing view acls groups to: 
2019-03-08 11:15:14,798 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:14
INFO  SecurityManager:54 - Changing modify acls groups to: 
2019-03-08 11:15:14,800 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:14
INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users
 with view permissions: Set(root); groups with view permissions: Set(); users  with modify
permissions: Set(root); groups with modify permissions: Set()
2019-03-08 11:15:14,811 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:14
INFO  Client:54 - Submitting application application_1552042715496_0010 to ResourceManager
2019-03-08 11:15:15,038 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:15
INFO  YarnClientImpl:273 - Submitted application application_1552042715496_0010
2019-03-08 11:15:16,042 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:16
INFO  Client:54 - Application report for application_1552042715496_0010 (state: ACCEPTED)
2019-03-08 11:15:16,046 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:16
INFO  Client:54 - 
2019-03-08 11:15:16,046 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 client token:
N/A
2019-03-08 11:15:16,046 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 diagnostics:
AM container is launched, waiting for AM container to Register with RM
2019-03-08 11:15:16,046 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 ApplicationMaster
host: N/A
2019-03-08 11:15:16,046 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 ApplicationMaster
RPC port: -1
2019-03-08 11:15:16,046 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 queue: default
2019-03-08 11:15:16,047 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 start time:
1552043714819
2019-03-08 11:15:16,047 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 final status:
UNDEFINED
2019-03-08 11:15:16,047 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 tracking URL:
http://master:8088/proxy/application_1552042715496_0010/
2019-03-08 11:15:16,050 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 user: root
2019-03-08 11:15:17,048 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:17
INFO  Client:54 - Application report for application_1552042715496_0010 (state: ACCEPTED)
2019-03-08 11:15:18,050 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:18
INFO  Client:54 - Application report for application_1552042715496_0010 (state: ACCEPTED)
2019-03-08 11:15:19,052 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:19
INFO  Client:54 - Application report for application_1552042715496_0010 (state: ACCEPTED)
2019-03-08 11:15:20,054 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:20
INFO  Client:54 - Application report for application_1552042715496_0010 (state: ACCEPTED)
2019-03-08 11:15:21,056 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:21
INFO  Client:54 - Application report for application_1552042715496_0010 (state: ACCEPTED)
2019-03-08 11:15:22,058 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:22
INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:22
INFO  Client:54 - 
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 client token:
N/A
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 diagnostics:
N/A
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 ApplicationMaster
host: 192.168.0.110
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 ApplicationMaster
RPC port: 0
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 queue: default
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 start time:
1552043714819
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 final status:
UNDEFINED
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 tracking URL:
http://master:8088/proxy/application_1552042715496_0010/
2019-03-08 11:15:22,063 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 	 user: root
2019-03-08 11:15:23,061 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:23
INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:24,062 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:24
INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:24,647 INFO  [FetcherRunner 1227064704-51] threadpool.DefaultFetcherRunner:94
: Job Fetcher: 1 should running, 1 actual running, 1 stopped, 0 ready, 22 already succeed,
29 error, 0 discarded, 0 others
2019-03-08 11:15:25,065 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:25
INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:26,067 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:26
INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:27,069 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:27
INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:28,071 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:28
INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:29,073 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:29
INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:30,075 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:30
INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:31,077 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:31
INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:32,079 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:32
INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:33,081 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:33
INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:33,487 DEBUG [http-nio-7070-exec-4] common.KylinConfig:328 : KYLIN_CONF property
was not set, will seek KYLIN_HOME env variable
2019-03-08 11:15:33,487 INFO  [http-nio-7070-exec-4] common.KylinConfig:334 : Use KYLIN_HOME=/home/apache/kylin
2019-03-08 11:15:34,083 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:34
INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:34,242 DEBUG [http-nio-7070-exec-3] badquery.BadQueryHistoryManager:65 :
Loaded 0 Bad Query(s)
2019-03-08 11:15:35,085 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:35
INFO  Client:54 - Application report for application_15520427



spark 显示正常, INFO  Client:54 - Application report for application_1552042715496_0010
(state: RUNNING)
但是跑了一两个小时,进度都在10%
spark的job显示,如下:



此cube用map reduce方式构建只要几分钟。


hadoop-root-resourcemanager-master.log 显示有在不停使用yarn的container, 数量由1-4不停变动。
以下为resourcemanager日志:
2019-03-08 11:32:34,829 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Allocation proposal accepted
2019-03-08 11:32:37,590 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e89_1552042715496_0010_01_000409 Container Transitioned from ACQUIRED to RELEASED
2019-03-08 11:32:37,590 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
USER=root	IP=192.168.0.110	OPERATION=AM Released Container	TARGET=SchedulerApp	RESULT=SUCCESS
APPID=application_1552042715496_0010	CONTAINERID=container_e89_1552042715496_0010_01_000409
RESOURCE=<memory:6144, vCores:1>	QUEUENAME=default
2019-03-08 11:32:37,591 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e89_1552042715496_0010_01_000410 Container Transitioned from ALLOCATED to ACQUIRED
2019-03-08 11:32:37,833 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
assignedContainer application attempt=appattempt_1552042715496_0010_000001 container=null
queue=default clusterResource=<memory:32000, vCores:24> type=NODE_LOCAL requestedPartition=
2019-03-08 11:32:37,833 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e89_1552042715496_0010_01_000411 Container Transitioned from NEW to ALLOCATED
2019-03-08 11:32:37,833 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
Assigned container container_e89_1552042715496_0010_01_000411 of capacity <memory:6144,
vCores:1> on host master:33725, which has 2 containers, <memory:12288, vCores:2>
used and <memory:3712, vCores:10> available after allocation
2019-03-08 11:32:37,833 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
USER=root	OPERATION=AM Allocated Container	TARGET=SchedulerApp	RESULT=SUCCESS	APPID=application_1552042715496_0010
CONTAINERID=container_e89_1552042715496_0010_01_000411	RESOURCE=<memory:6144, vCores:1>
QUEUENAME=default
2019-03-08 11:32:37,833 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
assignedContainer queue=root usedCapacity=0.512 absoluteUsedCapacity=0.512 used=<memory:16384,
vCores:3> cluster=<memory:32000, vCores:24>
2019-03-08 11:32:37,833 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Allocation proposal accepted
2019-03-08 11:32:40,595 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e89_1552042715496_0010_01_000410 Container Transitioned from ACQUIRED to RELEASED
2019-03-08 11:32:40,595 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
USER=root	IP=192.168.0.110	OPERATION=AM Released Container	TARGET=SchedulerApp	RESULT=SUCCESS
APPID=application_1552042715496_0010	CONTAINERID=container_e89_1552042715496_0010_01_000410
RESOURCE=<memory:6144, vCores:1>	QUEUENAME=default
2019-03-08 11:32:40,596 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e89_1552042715496_0010_01_000411 Container Transitioned from ALLOCATED to ACQUIRED
2019-03-08 11:32:40,836 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
assignedContainer application attempt=appattempt_1552042715496_0010_000001 container=null
queue=default clusterResource=<memory:32000, vCores:24> type=NODE_LOCAL requestedPartition=
2019-03-08 11:32:40,836 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e89_1552042715496_0010_01_000412 Container Transitioned from NEW to ALLOCATED
2019-03-08 11:32:40,836 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
Assigned container container_e89_1552042715496_0010_01_000412 of capacity <memory:6144,
vCores:1> on host master:33725, which has 2 containers, <memory:12288, vCores:2>
used and <memory:3712, vCores:10> available after allocation
2019-03-08 11:32:40,836 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
USER=root	OPERATION=AM Allocated Container	TARGET=SchedulerApp	RESULT=SUCCESS	APPID=application_1552042715496_0010
CONTAINERID=container_e89_1552042715496_0010_01_000412	RESOURCE=<memory:6144, vCores:1>
QUEUENAME=default
2019-03-08 11:32:40,836 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
assignedContainer queue=root usedCapacity=0.512 absoluteUsedCapacity=0.512 used=<memory:16384,
vCores:3> cluster=<memory:32000, vCores:24>
2019-03-08 11:32:40,836 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Allocation proposal accepted







附件是diagnosis
请帮忙看下怎么解决,谢谢!

Best regards
Mime
  • Unnamed multipart/mixed (inline, 8-Bit, 0 bytes)
View raw message