kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Singhi <ashishsin...@apache.org>
Subject Re: Queries For Building Cube
Date Thu, 16 Aug 2018 08:50:06 GMT
Hi Shrikant,

Refer http://kylin.apache.org/blog/2015/08/13/kylin-dictionary/
You might find it useful.

Regards,
Ashish

On Thu, Aug 16, 2018 at 10:33 AM, Shrikant Bang <b.shrikant28@gmail.com>
wrote:

> Thank you, ShaoFeng & Billy for responses.
>
> I could able to set hierarchies in dimension.
>
> While building cube, step "fact distinct column" job is failing in a
> reducer with Out Of Memory exception.
>
> java.lang.OutOfMemoryError: Java heap space
> at java.util.IdentityHashMap.resize(IdentityHashMap.java:471)
> at java.util.IdentityHashMap.put(IdentityHashMap.java:440)
> at org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes(
> TrieDictionaryBuilder.java:476)
> at org.apache.kylin.dict.TrieDictionaryBuilder.build(
> TrieDictionaryBuilder.java:418)
> at org.apache.kylin.dict.TrieDictionaryForestBuilder.build(
> TrieDictionaryForestBuilder.java:109)
> at org.apache.kylin.dict.DictionaryGenerator$StringTrieDictForestBuilder.
> build(DictionaryGenerator.java:220)
> at org.apache.kylin.engine.mr.steps.FactDistinctColumnsReducer.doCleanup(
> FactDistinctColumnsReducer.java:216)
> at org.apache.kylin.engine.mr.KylinReducer.cleanup(KylinReducer.java:103)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1657)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>
>
> I tried debugging and understood that dictionary is getting built in
> reducer's clean up method.
>
> I am curious to learn internals. Can you please help me in below :
>
>   1.  Any pointer/reference/JIRA for understanding how TRIE (dictionary)
> of dimension's value getting used in next steps?
>
>   2.  Any best practice/references in tuning "fact distinct column" job
> for those reducer which have high cardinality. I am trying with increasing
> memory as of now as partitioning and number of reducers are depends on
> cuboids number.
>
>
> P.S. I am using v2.4 of Kylin with HBase 1.x
>
> Thank You,
> Shrikant Bang
>
> On Tue, Aug 14, 2018 at 8:33 PM ShaoFeng Shi <shaofengshi@apache.org>
> wrote:
>
>> For question 1), in Cube's "advanced setting" step, you can specify the
>> cuboid whitelist to build.
>>
>> 2018-08-13 22:26 GMT+08:00 Billy Liu <billyliu@apache.org>:
>>
>>> Hello Shrikant,
>>>
>>> For 1, seems the 4 dimensions are hierarchy structure. You could
>>> define them as hierarchy dimensions in Cube, and leave A as mandatory
>>> dimension.
>>>
>>> For 2, select 'user_activity' as partition column in model design.
>>> There are a few built-in formats, most date types are supported.
>>>
>>> With Warm regards
>>>
>>> Billy Liu
>>> Shrikant Bang <b.shrikant28@gmail.com> 于2018年8月13日周一 下午5:39写道:
>>> >
>>> > Hi Team,
>>> >
>>> >      We are doing a PoC on building OLAP cubes. Could you please help
>>> me to get answer of below queries?
>>> >
>>> > Selective Cuboids:
>>> > We need to have selective cuboids as part of OLAP cubes.
>>> > Let say if we have 4 dimensions : A, B, C, D then we need just
>>> (A,B,C,D) , (A,B,C), (A,B) and (A)
>>> >
>>> > Refresh Settings:
>>> > How to specify partition column and format while building cube for
>>> fact table.
>>> > e.g. user_activity is partitioned by date 'yyyy-MM-dd' and cube should
>>> be refreshed everyday with previous day's computation.
>>> >
>>> >
>>> > Thank You,
>>> > Shrikant Bang
>>> >
>>>
>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>

Mime
View raw message