kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 仇同心 <qiutong...@jd.com>
Subject 答复: Cube 构建优化咨询
Date Mon, 14 Nov 2016 07:03:42 GMT
According to 15 days in a batch,I tried to build cube ,cube build succeed,but wen auto merge
cube,an error appeared:

java.lang.OutOfMemoryError: Requested array size exceeds VM limit
	at java.util.Arrays.copyOf(Arrays.java:2271)
	at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
	at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
	at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2147)
	at org.apache.commons.io.IOUtils.copy(IOUtils.java:2102)
	at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2123)
	at org.apache.commons.io.IOUtils.copy(IOUtils.java:2078)
	at org.apache.kylin.storage.hbase.HBaseResourceStore.putResourceImpl(HBaseResourceStore.java:239)
	at org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:208)
	at org.apache.kylin.dict.DictionaryManager.save(DictionaryManager.java:413)
	at org.apache.kylin.dict.DictionaryManager.saveNewDict(DictionaryManager.java:209)
	at org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:176)
	at org.apache.kylin.dict.DictionaryManager.mergeDictionary(DictionaryManager.java:269)
	at org.apache.kylin.engine.mr.steps.MergeDictionaryStep.mergeDictionaries(MergeDictionaryStep.java:145)
	at org.apache.kylin.engine.mr.steps.MergeDictionaryStep.makeDictForNewSegment(MergeDictionaryStep.java:135)
	at org.apache.kylin.engine.mr.steps.MergeDictionaryStep.doWork(MergeDictionaryStep.java:67)
	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
	at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
	at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)




-----邮件原件-----
发件人: Luke Han [mailto:luke.hq@gmail.com] 
发送时间: 2016年11月12日 0:1
收件人: user@kylin.apache.org
抄送: dev
主题: Re: Cube 构建优化咨询

don't try to run such huge job one time, please run them one by one, for example, run 1 month
data and then next one...




Best Regards!
---------------------

Luke Han

2016-11-10 14:54 GMT+08:00 仇同心 <qiutongxin@jd.com>:

> 大家好:
>
>      目前在构建cube时遇到问题:cube维度的基数不是很高,但是度量里的字段基数很高,Build
Dimension 
> Dictionary
> 就非常的占用本机内存,选取的度量的基数有千万、亿,甚至是十亿左右的,度量大多都是SUM,Count_distinct的精确计算。数据量是10
> 个
> 月的数据,我们是打算一次跑完10个月历史数据,然后在按日增跑作业。
>
>     服务器的内存配置为125G,#4 Step Name: Build Dimension Dictionary
> 会一直在跑很长时间,最后到导致内存溢出。
>
>      对于这种度量基数高的问题,有什么好的优化方案吗?
>
>
>
>
>
>
>
> 谢谢~
>
>
>
>
>
>
>
Mime
View raw message