kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dong Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KYLIN-2956) building trie dictionary blocked on value of length over 4095
Date Wed, 20 Dec 2017 15:40:00 GMT

    [ https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298651#comment-16298651
] 

Dong Li commented on KYLIN-2956:
--------------------------------

Thanks Gang! 

thought that the mask should be 0xFFFF8000, right?
and it will be nice to have some UTs to cover this method.

> building trie dictionary blocked on value of length over 4095 
> --------------------------------------------------------------
>
>                 Key: KYLIN-2956
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2956
>             Project: Kylin
>          Issue Type: Bug
>          Components: General
>            Reporter: Wang, Gang
>            Assignee: Wang, Gang
>         Attachments: 0001-KYLIN-2956-building-trie-dictionary-blocked-on-value.patch
>
>
> In the new release, Kylin will check the value length when building trie dictionary,
in class TrieDictionaryBuilder method buildTrieBytes, through method:
> private void positiveShortPreCheck(int i, String fieldName) {
>     if (!BytesUtil.isPositiveShort(i)) {
>         throw new IllegalStateException(fieldName + " is not positive short, usually
caused by too long dict value.");
>     }
> }
> public static boolean isPositiveShort(int i) {
>     return (i & 0xFFFF7000) == 0;
> }
> And 0xFFFF7000 in binary:  1111 1111 1111 1111 0111 0000 0000 0000, so the value length
should be less than  0000 0000 0000 0000 0001 0000 0001 1111, values 4095 in decimalism.
> I wonder why is 0xFFFF7000, should 0xFFFF8000 (1111 1111 1111 1111 1000 0000 0000 0000),
support max length:  0000 0000 0000 0000 0111 1111 1111 1111  (32767) 
> be what you want? 
> Or 32767 may be too large, I prefer use 0xFFFFE000, 0xFFFFE000 (1111 1111 1111 1111 1110
0000 0000 0000), support max length: 0000 0000 0000 0000 0001 1111 1111 1111  (8191) 
>      



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message