kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sonny Heer <sonnyh...@gmail.com>
Subject Re: AppendTrieDictionary with GlobalDictionary 1.6
Date Thu, 22 Jun 2017 22:30:04 GMT
Thanks ShaoFeng.

so to clarify.  for UHC dimension.  It is integer.  So i can set encoding
to integer and then also include it in GD for count distinct?  or leave it
out of GD and add it as integer encoding only?



On Wed, Jun 21, 2017 at 10:55 PM, ShaoFeng Shi <shaofengshi@apache.org>
wrote:

> Hi Sonny,
>
> I see; it is a defect: for one column Kylin at most use 1 dictionary, it
> couldn't differenciate ordinary dict and Global dict when that column is
> used in both dimension and measure.
>
> 25million is a Ultra High Cardinality dimension, it is not suitable for
> dict as the dict size will beyond Java heap size. In this case, please use
> fixed_length encoding; If that column is integer or long type, you can use
> "integer" encoding. In the meanwhile, keep using GD for the count distinct
> measure.
>
> 2017-06-22 13:37 GMT+08:00 Sonny Heer <sonnyheer@gmail.com>:
>
>> I see what you mean @ShaoFeng Shi.
>>
>> I noticed one of the measures I have defined is also a dimension.  So
>> what can I do in this case?  it is both needed as a count distinct measure
>> and dimension.  The typical dictionary gives java heap space error.  its
>> approximately 25m unique keys.  Any ideas on how best kylin can handle
>> this?  should I remove it as GD and add as dim & fix length?
>>
>> On Wed, Jun 21, 2017 at 10:33 PM, Sonny Heer <sonnyheer@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> No, not as a dimension.  Only for Count distinct measures.
>>>
>>>
>>> On Wed, Jun 21, 2017 at 10:25 PM, ShaoFeng Shi <shaofengshi@apache.org>
>>> wrote:
>>>
>>>> Hi Sonny, are you using GlobalDictionary for a dimension? If so, pls
>>>> change to use ordinary dictionary.
>>>>
>>>> The GlobalDictionary is a "one-way" dictionary, as it can only encode a
>>>> String to an integer, it doesn't support decode the String from an integer.
>>>> The main usage for GlobalDictionary is the precise Count Distinct, as
>>>> bitmap only accepts integer as input, so Kylin use the GD to do the
>>>> conversion.
>>>>
>>>> 2017-06-22 6:23 GMT+08:00 Sonny Heer <sonnyheer@gmail.com>:
>>>>
>>>>> After finally getting the global dictionary to work with building the
>>>>> cube there are now exceptions during query.
>>>>>
>>>>> ERROR in query:
>>>>> "AppendTrieDictionary can't retrive value from id"
>>>>>
>>>>>
>>>>> Here is where it ends up in the code::: ->
>>>>>
>>>>>     @Override
>>>>>
>>>>>     final protected T getValueFromIdImpl(int id) {
>>>>>
>>>>>         throw new UnsupportedOperationException("AppendTrieDictionary
>>>>> can't retrive value from id");
>>>>>
>>>>>     }
>>>>>
>>>>>
>>>>>     @Override
>>>>>
>>>>>     protected byte[] getValueBytesFromIdImpl(int id) {
>>>>>
>>>>>         throw new UnsupportedOperationException("AppendTrieDictionary
>>>>> can't retrive value from id");
>>>>>
>>>>>     }
>>>>>
>>>>>
>>>>>     @Override
>>>>>
>>>>>     protected int getValueBytesFromIdImpl(int id, byte[] returnValue,
>>>>> int offset) {
>>>>>
>>>>>         throw new UnsupportedOperationException("AppendTrieDictionary
>>>>> can't retrive value from id");
>>>>>
>>>>>     }
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>> Shaofeng Shi 史少锋
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>> Sonny S. Heer
>>> Senior Software Engineer
>>> m: 360-434-4354 <(360)%20434-4354> h: 509-884-2574 <(509)%20884-2574>
>>>
>>
>>
>>
>> --
>>
>>
>> Sonny S. Heer
>> Senior Software Engineer
>> m: 360-434-4354 <(360)%20434-4354> h: 509-884-2574 <(509)%20884-2574>
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


-- 


Sonny S. Heer
Senior Software Engineer
m: 360-434-4354 h: 509-884-2574

Mime
View raw message