kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yangcao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KYLIN-3428) java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Date Tue, 26 Jun 2018 11:28:00 GMT

     [ https://issues.apache.org/jira/browse/KYLIN-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

yangcao updated KYLIN-3428:
---------------------------
    Labels: Build_Base_Cuboid MAP OOM  (was: Build_Base_Cuboid OOM)

> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
> -----------------------------------------------------------------
>
>                 Key: KYLIN-3428
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3428
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>    Affects Versions: v2.1.0, v2.2.0, v2.3.0, v2.3.1, v2.4.0
>         Environment: kylin v2.2.0   jdk7
>            Reporter: yangcao
>            Priority: Critical
>              Labels: Build_Base_Cuboid, MAP, OOM
>         Attachments: patch-v1.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> LOG:
> 2018-06-26 15:50:24,032 INFO [main] org.apache.kylin.dict.DictionaryManager: DictionaryManager(1499050426)
loading DictionaryInfo(loadDictObj:true) at /dict/xxx.xxx/C7/036b7ca0-8733-4c0c-99f5-5122919fd3dd.dict
2018-06-26 15:50:25,586 ERROR [main] org.apache.kylin.engine.mr.KylinMapper: com.google.common.util.concurrent.ExecutionError:
java.lang.OutOfMemoryError: Requested array size exceeds VM limit at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2232)
at com.google.common.cache.LocalCache.get(LocalCache.java:3965) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829) at org.apache.kylin.dict.DictionaryManager.getDictionaryInfo(DictionaryManager.java:118)
at org.apache.kylin.cube.CubeManager.getDictionary(CubeManager.java:271) at org.apache.kylin.cube.CubeSegment.getDictionary(CubeSegment.java:320)
at org.apache.kylin.cube.kv.CubeDimEncMap.getDictionary(CubeDimEncMap.java:86) at org.apache.kylin.cube.kv.CubeDimEncMap.get(CubeDimEncMap.java:65)
at org.apache.kylin.cube.kv.RowKeyColumnIO.getColumnLength(RowKeyColumnIO.java:43) at org.apache.kylin.cube.kv.RowKeyEncoder.<init>(RowKeyEncoder.java:59)
at org.apache.kylin.cube.kv.AbstractRowKeyEncoder.createInstance(AbstractRowKeyEncoder.java:48)
at org.apache.kylin.engine.mr.common.BaseCuboidBuilder.<init>(BaseCuboidBuilder.java:84)
at org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.doSetup(BaseCuboidMapperBase.java:70)
at org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.doSetup(HiveToBaseCuboidMapper.java:36)
at org.apache.kylin.engine.mr.KylinMapper.setup(KylinMapper.java:48) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native
Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.OutOfMemoryError:
Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1793) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744) at org.apache.kylin.common.persistence.FileResourceStore.getResourceImpl(FileResourceStore.java:123)
at org.apache.kylin.common.persistence.ResourceStore.getResource(ResourceStore.java:154) at
org.apache.kylin.dict.DictionaryManager.load(DictionaryManager.java:418) at org.apache.kylin.dict.DictionaryManager$1.load(DictionaryManager.java:101)
at org.apache.kylin.dict.DictionaryManager$1.load(DictionaryManager.java:98) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
at org.apache.kylin.dict.DictionaryManager.getDictionaryInfo(DictionaryManager.java:118) at
org.apache.kylin.cube.CubeManager.getDictionary(CubeManager.java:271) at org.apache.kylin.cube.CubeSegment.getDictionary(CubeSegment.java:320)
at org.apache.kylin.cube.kv.CubeDimEncMap.getDictionary(CubeDimEncMap.java:86) at org.apache.kylin.cube.kv.CubeDimEncMap.get(CubeDimEncMap.java:65)
at org.apache.kylin.cube.kv.RowKeyColumnIO.getColumnLength(RowKeyColumnIO.java:43) at org.apache.kylin.cube.kv.RowKeyEncoder.<init>(RowKeyEncoder.java:59)
at org.apache.kylin.cube.kv.AbstractRowKeyEncoder.createInstance(AbstractRowKeyEncoder.java:48)
at org.apache.kylin.engine.mr.common.BaseCuboidBuilder.<init>(BaseCuboidBuilder.java:84)
at org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.doSetup(BaseCuboidMapperBase.java:70)
at org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.doSetup(HiveToBaseCuboidMapper.java:36)
at org.apache.kylin.engine.mr.KylinMapper.setup(KylinMapper.java:48) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>  
> 原因分析:
>  #  C7是个高基数维度,字段平均字节较长,字典文件字节长度:1085484823
;
>  # kylin load字典文件的实现见 FileResourceStore.getResourceImpl()方法,ByteArrayOutputStream的初始容量为1000,在copy时会不断扩容,逻辑如下(每次最少扩容2倍,最大值Integer.MAX_VALUE):
>  private void grow(int minCapacity) {
>         // overflow-conscious code
>         int oldCapacity = buf.length;
>         int newCapacity = oldCapacity << 1;
>         if (newCapacity - minCapacity < 0)
>             newCapacity = minCapacity;
>         if (newCapacity < 0)
> {             if (minCapacity < 0) // overflow                 throw
new OutOfMemoryError();             newCapacity = Integer.MAX_VALUE;         }
>         buf = Arrays.copyOf(buf, newCapacity);
>     }
>      3.  JVM数组对数组长度有限制,不同环境上限可能不一样,可以通过
byte[] bytes = new byte[length] 测得具体是多少,一般是Integer.MAX_VALUE - 2。
>            
> 修复建议:
> ByteArrayOutputStream初始容量设置为文件字节长度,避免扩容。



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message