Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 38B72200CBE for ; Fri, 23 Jun 2017 00:30:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 37344160BF1; Thu, 22 Jun 2017 22:30:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id F1DAE160BE7 for ; Fri, 23 Jun 2017 00:30:09 +0200 (CEST) Received: (qmail 34033 invoked by uid 500); 22 Jun 2017 22:30:07 -0000 Mailing-List: contact user-help@kylin.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kylin.apache.org Delivered-To: mailing list user@kylin.apache.org Received: (qmail 33954 invoked by uid 99); 22 Jun 2017 22:30:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jun 2017 22:30:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 300601A07CE for ; Thu, 22 Jun 2017 22:30:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id nQIvXJQTKbYi for ; Thu, 22 Jun 2017 22:30:05 +0000 (UTC) Received: from mail-qk0-f179.google.com (mail-qk0-f179.google.com [209.85.220.179]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 769295F6C1 for ; Thu, 22 Jun 2017 22:30:05 +0000 (UTC) Received: by mail-qk0-f179.google.com with SMTP id d14so23623958qkb.1 for ; Thu, 22 Jun 2017 15:30:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=1GUCHtsfKz+44KOT7qRqLBTZe8hKcCZY53s2JAupjsU=; b=WXw1QzJT5t1KW9CaWZ4Qy/pIoHqiN9zDPJ0OehaWSvdCGN+eDiSbn4t/0kD7Ndp7xy qw2STqPdCQF4OR/MDTDio9qQY5oUJcI30DqwC2RjgJfuXoE/FgvAlxudD1TV8J/jjIbT B1SfSGhNn9RL+eYr0l0KSBsPB+UfgjvMg6CXhlTMg+7wiERDIXK0fFdfe+FS1ELeeXQa tzxwy6YAOatW3dtOesQ3QxEONGAJPDy2iHyxizKssFvq9RgMgVf6dcxlX6ktsfvyqJ+4 cvTZ3JM/DT3KoJp0B4FxcGochkSJEt+cA02J+RMY6NgJrbG6ybE4WNouAeDBSHVl4PTx aXvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=1GUCHtsfKz+44KOT7qRqLBTZe8hKcCZY53s2JAupjsU=; b=Ys2nfO4KH1/CxOGf4K5CBnpo0zqMAozePpIWcDAQ7P+QvNJM0U+kJW5bZmJUbC54AO m2T3G/dmKIrxeGs8oeaOZCUbuXFHrrG19MacXLlpLBDJYGXCV3h5wixcIgY6esUl4EXg zsYKkYCSIlq9nvBzuYXiafpfDipT0oiS6UwNNxiMIThGvTrZOTV8++lPopz37A+rQ9RM 5p06fLtNnQpP7nPNcuA4gHwt+7HhXQQ7Bfurf+LmBPSZe3ZrIRVQymGlKHjIYg2TJOPt KJRbO190GlbcWbXk+r70CGpQ7XnqDHOGNW11K482HtiyRzpx/YY6hCQYv7Flqk9sZZxI lY4w== X-Gm-Message-State: AKS2vOwAlbkwSWXNzhudGPfdem55vRhQ/TTgo0Uzak+3lV+qS6yvrQPz +7sbvyJHD9paH2HcUWy2gdGMyfyvZQ== X-Received: by 10.55.65.22 with SMTP id o22mr5925918qka.158.1498170604985; Thu, 22 Jun 2017 15:30:04 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.36.135 with HTTP; Thu, 22 Jun 2017 15:30:04 -0700 (PDT) In-Reply-To: References: From: Sonny Heer Date: Thu, 22 Jun 2017 15:30:04 -0700 Message-ID: Subject: Re: AppendTrieDictionary with GlobalDictionary 1.6 To: user@kylin.apache.org Content-Type: multipart/alternative; boundary="001a114ac490b922bc0552940654" archived-at: Thu, 22 Jun 2017 22:30:11 -0000 --001a114ac490b922bc0552940654 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks ShaoFeng. so to clarify. for UHC dimension. It is integer. So i can set encoding to integer and then also include it in GD for count distinct? or leave it out of GD and add it as integer encoding only? On Wed, Jun 21, 2017 at 10:55 PM, ShaoFeng Shi wrote: > Hi Sonny, > > I see; it is a defect: for one column Kylin at most use 1 dictionary, it > couldn't differenciate ordinary dict and Global dict when that column is > used in both dimension and measure. > > 25million is a Ultra High Cardinality dimension, it is not suitable for > dict as the dict size will beyond Java heap size. In this case, please us= e > fixed_length encoding; If that column is integer or long type, you can us= e > "integer" encoding. In the meanwhile, keep using GD for the count distinc= t > measure. > > 2017-06-22 13:37 GMT+08:00 Sonny Heer : > >> I see what you mean @ShaoFeng Shi. >> >> I noticed one of the measures I have defined is also a dimension. So >> what can I do in this case? it is both needed as a count distinct measu= re >> and dimension. The typical dictionary gives java heap space error. its >> approximately 25m unique keys. Any ideas on how best kylin can handle >> this? should I remove it as GD and add as dim & fix length? >> >> On Wed, Jun 21, 2017 at 10:33 PM, Sonny Heer wrote= : >> >>> Hi, >>> >>> No, not as a dimension. Only for Count distinct measures. >>> >>> >>> On Wed, Jun 21, 2017 at 10:25 PM, ShaoFeng Shi >>> wrote: >>> >>>> Hi Sonny, are you using GlobalDictionary for a dimension? If so, pls >>>> change to use ordinary dictionary. >>>> >>>> The GlobalDictionary is a "one-way" dictionary, as it can only encode = a >>>> String to an integer, it doesn't support decode the String from an int= eger. >>>> The main usage for GlobalDictionary is the precise Count Distinct, as >>>> bitmap only accepts integer as input, so Kylin use the GD to do the >>>> conversion. >>>> >>>> 2017-06-22 6:23 GMT+08:00 Sonny Heer : >>>> >>>>> After finally getting the global dictionary to work with building the >>>>> cube there are now exceptions during query. >>>>> >>>>> ERROR in query: >>>>> "AppendTrieDictionary can't retrive value from id" >>>>> >>>>> >>>>> Here is where it ends up in the code::: -> >>>>> >>>>> @Override >>>>> >>>>> final protected T getValueFromIdImpl(int id) { >>>>> >>>>> throw new UnsupportedOperationException("AppendTrieDictionary >>>>> can't retrive value from id"); >>>>> >>>>> } >>>>> >>>>> >>>>> @Override >>>>> >>>>> protected byte[] getValueBytesFromIdImpl(int id) { >>>>> >>>>> throw new UnsupportedOperationException("AppendTrieDictionary >>>>> can't retrive value from id"); >>>>> >>>>> } >>>>> >>>>> >>>>> @Override >>>>> >>>>> protected int getValueBytesFromIdImpl(int id, byte[] returnValue, >>>>> int offset) { >>>>> >>>>> throw new UnsupportedOperationException("AppendTrieDictionary >>>>> can't retrive value from id"); >>>>> >>>>> } >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> >>>> Shaofeng Shi =E5=8F=B2=E5=B0=91=E9=94=8B >>>> >>>> >>> >>> >>> -- >>> >>> >>> Sonny S. Heer >>> Senior Software Engineer >>> m: 360-434-4354 <(360)%20434-4354> h: 509-884-2574 <(509)%20884-2574> >>> >> >> >> >> -- >> >> >> Sonny S. Heer >> Senior Software Engineer >> m: 360-434-4354 <(360)%20434-4354> h: 509-884-2574 <(509)%20884-2574> >> > > > > -- > Best regards, > > Shaofeng Shi =E5=8F=B2=E5=B0=91=E9=94=8B > > --=20 Sonny S. Heer Senior Software Engineer m: 360-434-4354 h: 509-884-2574 --001a114ac490b922bc0552940654 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks ShaoFeng.

so to clarify. =C2=A0f= or UHC dimension.=C2=A0 It is integer.=C2=A0 So i can set encoding to integ= er and then also include it in GD for count distinct? =C2=A0or leave it out= of GD and add it as integer encoding only?



On Wed,= Jun 21, 2017 at 10:55 PM, ShaoFeng Shi <shaofengshi@apache.org&g= t; wrote:
Hi Sonn= y,

I see; it is a defect: for one column Kylin at most u= se 1 dictionary, it couldn't differenciate ordinary dict and Global dic= t when that column is used in both dimension and measure.

25million is a Ultra High Cardinality dimension, it is not suitable f= or dict as the dict size will beyond Java heap size. In this case, please u= se fixed_length encoding; If that column is integer or long type, you can u= se "integer" encoding. In the meanwhile, keep using GD for the co= unt distinct measure.

2017-06-22 13= :37 GMT+08:00 Sonny Heer <sonnyheer@gmail.com>:
I see what you mean @ShaoFeng Shi.= =C2=A0

I noticed one of the measures I have defined is = also a dimension.=C2=A0 So what can I do in this case? =C2=A0it is both nee= ded as a count distinct measure and dimension.=C2=A0 The typical dictionary= gives java heap space error. =C2=A0its approximately 25m unique keys.=C2= =A0 Any ideas on how best kylin can handle this? =C2=A0should I remove it a= s GD and add as dim & fix length?

On Wed, Jun 21, 2017 at 10:33 PM, = Sonny Heer <sonnyheer@gmail.com> wrote:
Hi,

No, not as a dimens= ion.=C2=A0 Only for Count distinct measures. =C2=A0=C2=A0


On Wed, Jun 21, = 2017 at 10:25 PM, ShaoFeng Shi <shaofengshi@apache.org>= wrote:
Hi Sonny, are yo= u using GlobalDictionary for a dimension? If so, pls change to use ordinary= dictionary.

The GlobalDictionary is a "one-way&quo= t; dictionary, as it can only encode a String to an integer, it doesn't= support decode the String from an integer. The main usage for GlobalDictio= nary is the precise Count Distinct, as bitmap only accepts integer as input= , so Kylin use the GD to do the conversion.=C2=A0

2017-06-22 6:23 G= MT+08:00 Sonny Heer <sonnyheer@gmail.com>:
After finally getting the global d= ictionary to work with building the cube there are now exceptions during qu= ery. =C2=A0

ERROR in query:
"AppendTrieDictionary can't retrive value from id"


<= /div>
Here is where it ends up in the code::: ->

=C2=A0 =C2=A0=C2=A0<= span class=3D"m_-4710893362451048560m_-7287826728848403280m_-10151201245096= 90842m_3477748511538441860m_6522287806576646732gmail-s1">@Override

=C2=A0 =C2=A0 final protected T getValue= FromIdImpl(int id) {

=C2=A0 =C2=A0 =C2=A0 =C2=A0 throw new UnsupportedOperationException("AppendTrieDictionary can't retrive value from id");

=C2=A0 =C2=A0 }


=C2=A0 =C2=A0 @Override

=C2=A0 =C2=A0 protected byte[] getValueB= ytesFromIdImpl(in= t id) {

=C2=A0 =C2=A0 =C2=A0 =C2=A0 throw new UnsupportedOperationException("AppendTrieDictionary can't retrive value from id");

=C2=A0 =C2=A0 }


=C2=A0 =C2=A0 @Override

=C2=A0 =C2=A0 protected int getValueByte= sFromIdImpl(int id, byte[] returnValue, int offset) {

=C2=A0 =C2=A0 =C2=A0 =C2=A0 throw new UnsupportedOperationException("AppendTrieDictionary can't retrive value from id");

=C2=A0 =C2=A0 }

<= /div>

--
=




--
Best regards,

Shaofeng Shi =E5=8F=B2=E5=B0=91=E9=94= =8B




--

Sonny S. Heer
Senior Software Engineer
m: 360-434-4354 h: 509-884-2574



--


Sonny S. Heer
Senior Software E= ngineer
m: 360-434-4354 h: 509-884-2574



--
=
Best regards,

<= /div>
Shaofeng Shi =E5=8F=B2=E5=B0=91=E9=94=8B




--
=


Sonny S. Heer
Senior Software Engineer
m: 360-434-4354 h: 509= -884-2574
--001a114ac490b922bc0552940654--