kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ShaoFeng Shi <shaofeng...@apache.org>
Subject Re: AppendTrieDictionary with GlobalDictionary 1.6
Date Fri, 23 Jun 2017 11:24:07 GMT
Is the data type of "USER_ID" is bigInt (long)?
https://github.com/apache/kylin/blob/kylin-1.6.0/core-metadata/src/main/java/org/apache/kylin/measure/bitmap/BitmapMeasureType.java#L159

Please provide detail metadata for trouble shooting, that is important for
analysis; otherwise we can only guess, but there are many possiblilies
cause a problem...

2017-06-23 14:31 GMT+08:00 Sonny Heer <sonnyheer@gmail.com>:

> It's a dimension and count distinct measure.  No GD
>
> On Thu, Jun 22, 2017 at 11:27 PM ShaoFeng Shi <shaofengshi@apache.org>
> wrote:
>
>> Does the "USER_ID" column appear in other measures?
>>
>> 2017-06-23 13:57 GMT+08:00 Sonny Heer <sonnyheer@gmail.com>:
>>
>>> It is set to this:
>>>
>>>       {
>>>
>>>         "column": "USER_ID",
>>>
>>>         "encoding": "integer:4",
>>>
>>>         "isShardBy": true
>>>
>>>       },
>>>
>>>
>>> Not sure why its still trying to build the dictionary.  Keep in mind this column
is a measure too (is that why it tries to create dict?).
>>>
>>> This is in logs before the exception:
>>>
>>> dict.DictionaryManager:314 : Building dictionary object {"uuid":"4fbbfdb1-5ae3-4e0b-ae73-c071745b60d6","last_modified":0,"version":"1.6.0","source_table":"DB.USER","source_column":"USER_ID","source_column_index":0,"data_type":"bigint","input":{"path":"/kylin/kylin_metadata/kylin-30760e39-c2bb-4bc7-8239-e18e98d697e0/My_Cube_Name/fact_distinct_columns/USER_ID","size":177579241,"last_modified_time":1498183660303},"dictionary_class":null,"cardinality":0}
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jun 22, 2017 at 10:47 PM, ShaoFeng Shi <shaofengshi@apache.org>
>>> wrote:
>>>
>>>> Seems Kylin still trying to build dictionary for the UHC dimension.
>>>> Could you double check the dimension encoding setting in the "Advanced"
>>>> step?
>>>>
>>>> 2017-06-23 12:54 GMT+08:00 Sonny Heer <sonnyheer@gmail.com>:
>>>>
>>>>> Step 4.  we gave 4gb to kylin server.
>>>>>
>>>>>
>>>>> #4 Step Name: Build Dimension Dictionary
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>>>
>>>>>
>>>>>         at java.util.IdentityHashMap.resize(IdentityHashMap.java:471)
>>>>>
>>>>>
>>>>>         at java.util.IdentityHashMap.put(IdentityHashMap.java:440)
>>>>>
>>>>>
>>>>>         at org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes(
>>>>> TrieDictionaryBuilder.java:464)
>>>>>
>>>>>
>>>>>         at org.apache.kylin.dict.NumberDictionaryBuilder.build(
>>>>> NumberDictionaryBuilder.java:43)
>>>>>
>>>>>
>>>>>         at org.apache.kylin.dict.DictionaryGenerator$
>>>>> NumberDictBuilder.build(DictionaryGenerator.java:186)
>>>>>
>>>>>
>>>>>         at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(
>>>>> DictionaryGenerator.java:81)
>>>>>
>>>>>
>>>>>         at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(
>>>>> DictionaryGenerator.java:73)
>>>>>
>>>>>
>>>>>         at org.apache.kylin.dict.DictionaryManager.buildDictionary(
>>>>> DictionaryManager.java:321)
>>>>>
>>>>>
>>>>>         at org.apache.kylin.cube.CubeManager.buildDictionary(
>>>>> CubeManager.java:222)
>>>>>
>>>>>
>>>>>         at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.
>>>>> processSegment(DictionaryGeneratorCLI.java:50)
>>>>>
>>>>>
>>>>>         at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.
>>>>> processSegment(DictionaryGeneratorCLI.java:41)
>>>>>
>>>>>
>>>>>         at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(
>>>>> CreateDictionaryJob.java:54)
>>>>>
>>>>>
>>>>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>>>>>
>>>>>
>>>>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>>>>>
>>>>>
>>>>>         at org.apache.kylin.engine.mr.common.HadoopShellExecutable.
>>>>> doWork(HadoopShellExecutable.java:63)
>>>>>
>>>>>
>>>>>         at org.apache.kylin.job.execution.AbstractExecutable.
>>>>> execute(AbstractExecutable.java:113)
>>>>>
>>>>> On Thu, Jun 22, 2017 at 7:59 PM, ShaoFeng Shi <shaofengshi@apache.org>
>>>>> wrote:
>>>>>
>>>>>> In which step it ran out of memory? could you share the JSON of the
>>>>>> Cube definition? It can be found in the "JSON(Cube)" tab.
>>>>>>
>>>>>> 2017-06-23 8:48 GMT+08:00 Sonny Heer <sonnyheer@gmail.com>:
>>>>>>
>>>>>>> The column has count distinct measure as well.  so it still doesn't
>>>>>>> need GD?  i tried, but appears it ran out of memory.
>>>>>>>
>>>>>>> On Thu, Jun 22, 2017 at 5:36 PM, ShaoFeng Shi <
>>>>>>> shaofengshi@apache.org> wrote:
>>>>>>>
>>>>>>>> For integer values, Global Dictionary is not needed.
>>>>>>>>
>>>>>>>> So what you do is just set "integer:4" as the encoding in
the
>>>>>>>> dimension, and leave blank for the global dictionary.
>>>>>>>>
>>>>>>>> 2017-06-23 6:30 GMT+08:00 Sonny Heer <sonnyheer@gmail.com>:
>>>>>>>>
>>>>>>>>> Thanks ShaoFeng.
>>>>>>>>>
>>>>>>>>> so to clarify.  for UHC dimension.  It is integer.  So
i can set
>>>>>>>>> encoding to integer and then also include it in GD for
count distinct?  or
>>>>>>>>> leave it out of GD and add it as integer encoding only?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jun 21, 2017 at 10:55 PM, ShaoFeng Shi <
>>>>>>>>> shaofengshi@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Sonny,
>>>>>>>>>>
>>>>>>>>>> I see; it is a defect: for one column Kylin at most
use 1
>>>>>>>>>> dictionary, it couldn't differenciate ordinary dict
and Global dict when
>>>>>>>>>> that column is used in both dimension and measure.
>>>>>>>>>>
>>>>>>>>>> 25million is a Ultra High Cardinality dimension,
it is not
>>>>>>>>>> suitable for dict as the dict size will beyond Java
heap size. In this
>>>>>>>>>> case, please use fixed_length encoding; If that column
is integer or long
>>>>>>>>>> type, you can use "integer" encoding. In the meanwhile,
keep using GD for
>>>>>>>>>> the count distinct measure.
>>>>>>>>>>
>>>>>>>>>> 2017-06-22 13:37 GMT+08:00 Sonny Heer <sonnyheer@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> I see what you mean @ShaoFeng Shi.
>>>>>>>>>>>
>>>>>>>>>>> I noticed one of the measures I have defined
is also a
>>>>>>>>>>> dimension.  So what can I do in this case?  it
is both needed as a count
>>>>>>>>>>> distinct measure and dimension.  The typical
dictionary gives java heap
>>>>>>>>>>> space error.  its approximately 25m unique keys.
 Any ideas on how best
>>>>>>>>>>> kylin can handle this?  should I remove it as
GD and add as dim & fix
>>>>>>>>>>> length?
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 21, 2017 at 10:33 PM, Sonny Heer
<
>>>>>>>>>>> sonnyheer@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> No, not as a dimension.  Only for Count distinct
measures.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jun 21, 2017 at 10:25 PM, ShaoFeng
Shi <
>>>>>>>>>>>> shaofengshi@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Sonny, are you using GlobalDictionary
for a dimension? If
>>>>>>>>>>>>> so, pls change to use ordinary dictionary.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The GlobalDictionary is a "one-way" dictionary,
as it can only
>>>>>>>>>>>>> encode a String to an integer, it doesn't
support decode the String from an
>>>>>>>>>>>>> integer. The main usage for GlobalDictionary
is the precise Count Distinct,
>>>>>>>>>>>>> as bitmap only accepts integer as input,
so Kylin use the GD to do the
>>>>>>>>>>>>> conversion.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2017-06-22 6:23 GMT+08:00 Sonny Heer
<sonnyheer@gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> After finally getting the global
dictionary to work with
>>>>>>>>>>>>>> building the cube there are now exceptions
during query.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ERROR in query:
>>>>>>>>>>>>>> "AppendTrieDictionary can't retrive
value from id"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is where it ends up in the code:::
->
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     @Override
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     final protected T getValueFromIdImpl(int
id) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         throw new UnsupportedOperationException("AppendTrieDictionary
>>>>>>>>>>>>>> can't retrive value from id");
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     @Override
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     protected byte[] getValueBytesFromIdImpl(int
id) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         throw new UnsupportedOperationException("AppendTrieDictionary
>>>>>>>>>>>>>> can't retrive value from id");
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     @Override
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     protected int getValueBytesFromIdImpl(int
id, byte[]
>>>>>>>>>>>>>> returnValue, int offset) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         throw new UnsupportedOperationException("AppendTrieDictionary
>>>>>>>>>>>>>> can't retrive value from id");
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Shaofeng Shi 史少锋
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Sonny S. Heer
>>>>>>>>>>>> Senior Software Engineer
>>>>>>>>>>>> m: 360-434-4354 <(360)%20434-4354>
h: 509-884-2574
>>>>>>>>>>>> <(509)%20884-2574>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sonny S. Heer
>>>>>>>>>>> Senior Software Engineer
>>>>>>>>>>> m: 360-434-4354 <(360)%20434-4354> h: 509-884-2574
>>>>>>>>>>> <(509)%20884-2574>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>> Shaofeng Shi 史少锋
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sonny S. Heer
>>>>>>>>> Senior Software Engineer
>>>>>>>>> m: 360-434-4354 <(360)%20434-4354> h: 509-884-2574
>>>>>>>>> <(509)%20884-2574>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Shaofeng Shi 史少锋
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>> Sonny S. Heer
>>>>>>> Senior Software Engineer
>>>>>>> m: 360-434-4354 <(360)%20434-4354> h: 509-884-2574
>>>>>>> <(509)%20884-2574>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>>
>>>>>> Shaofeng Shi 史少锋
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> Sonny S. Heer
>>>>> Senior Software Engineer
>>>>> m: 360-434-4354 <(360)%20434-4354> h: 509-884-2574 <(509)%20884-2574>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>> Shaofeng Shi 史少锋
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>> Sonny S. Heer
>>> Senior Software Engineer
>>> m: 360-434-4354 h: 509-884-2574
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>>
>>
>>
>>


-- 
Best regards,

Shaofeng Shi 史少锋

Mime
View raw message