Return-Path: X-Original-To: apmail-kylin-user-archive@minotaur.apache.org Delivered-To: apmail-kylin-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D32A17862 for ; Thu, 29 Oct 2015 10:08:26 +0000 (UTC) Received: (qmail 74248 invoked by uid 500); 29 Oct 2015 10:08:26 -0000 Delivered-To: apmail-kylin-user-archive@kylin.apache.org Received: (qmail 74206 invoked by uid 500); 29 Oct 2015 10:08:26 -0000 Mailing-List: contact user-help@kylin.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kylin.incubator.apache.org Delivered-To: mailing list user@kylin.incubator.apache.org Received: (qmail 74196 invoked by uid 99); 29 Oct 2015 10:08:26 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Oct 2015 10:08:26 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id D0EB3C6056 for ; Thu, 29 Oct 2015 10:08:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4 X-Spam-Level: **** X-Spam-Status: No, score=4 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id iCrcdQHiqJ7S for ; Thu, 29 Oct 2015 10:08:14 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with SMTP id 6CEFF42ABE for ; Thu, 29 Oct 2015 10:08:14 +0000 (UTC) Received: (qmail 74104 invoked by uid 99); 29 Oct 2015 10:08:13 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Oct 2015 10:08:13 +0000 Received: from mail-qg0-f46.google.com (mail-qg0-f46.google.com [209.85.192.46]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 7D9A51A0230 for ; Thu, 29 Oct 2015 10:08:13 +0000 (UTC) Received: by qgem9 with SMTP id m9so29520037qge.1 for ; Thu, 29 Oct 2015 03:08:12 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.140.155.15 with SMTP id b15mr910684qhb.29.1446113292324; Thu, 29 Oct 2015 03:08:12 -0700 (PDT) Received: by 10.55.173.147 with HTTP; Thu, 29 Oct 2015 03:08:12 -0700 (PDT) In-Reply-To: References: Date: Thu, 29 Oct 2015 18:08:12 +0800 Message-ID: Subject: Re: Timestamp related issues From: Li Yang To: user@kylin.incubator.apache.org Content-Type: multipart/alternative; boundary=001a1139daa618160805233b7dc1 --001a1139daa618160805233b7dc1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable > 1. Is there any issue with Timestamp/Date values ? Timestamp testing is very limited on 1.x branch. All use cases I knew about uses date instead of timestamp. The 2.x branch has much better timestamp support. > 2. For measures with distinct count, it uses approximations with certain error rates, lowest of which is <1.22%. Does this guarantee that counts would be accurate ? The short answer is no 100% guarantee. The count distinct algorithm behind this is HyperLogLog [1]. Its error follows a normal distribution. The "< 1.22%" is brief of saying for 99.7% out of all the results, the error is <1.22% in theory. And there's still 0.3% results could go beyond the error. [1] https://en.wikipedia.org/wiki/HyperLogLog On Tue, Oct 27, 2015 at 12:45 PM, Chetan Dixit wrote: > Hello Kylin Team, > > > > We are facing following issues while using Kylin could you please help. > > > > 1. Is there any issue with Timestamp/Date values ? > > We see issues in queries using =E2=80=9CWHERE columnname = =3D > timestamp =E2=80=982015-07-23 10:30:00=E2=80=99 =E2=80=9C it does not ret= urn any results. > > If we use =E2=80=9CWHERE columnname =3D =E2=80=982015-07-2= 3 10:30:00=E2=80=99 =E2=80=9C it > returns ERROR > > If use timestamp column in projection list, it truncates > the timestamp part i.e. 2015-07-23 10:30:00 to 2015-07-23 00:00:00 > > > > 2. For measures with distinct count, it uses approximations with > certain error rates, lowest of which is <1.22%. Does this guarantee that > counts would be accurate ? > > We have seen for a count of 1000 results as 982, 1000 etc. > > > > Thanks, > > Chetan > > > --001a1139daa618160805233b7dc1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
>=C2=A01.=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0Is ther= e any issue with Timestamp/Date values ?

Timestamp testing is very limited on 1.x branch= . All use cases I knew about uses date instead of timestamp.<= /div>
The 2.x branch has much better timestamp supp= ort.

>=C2=A02.=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0For measures with distinct count, it uses approximations with certain err= or rates, lowest of which is <1.22%. Does this guarantee that counts wou= ld be accurate ?

The= short answer is no 100% guarantee. The count distinct algorithm behind thi= s is HyperLogLog [1]. Its error follows a normal distribution. The "&l= t; 1.22%" is brief of saying for=C2=A099.7% out o= f all the results, the error is <1.22% in theory. And there's still = 0.3% results could go beyond the error.


On Tue, Oct 27, 2015 at 12:45 PM, Chetan Dixit &l= t;Chetan_Di= xit1@symantec.com> wrote:
<= div lang=3D"EN-US" link=3D"blue" vlink=3D"purple">

Hello Kylin Team,

=C2=A0<= u>

We are facing following issues while using= Kylin could you please help.

=C2=A0

1.=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = Is there any issue w= ith Timestamp/Date values ?

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 We see issues in queries using = =E2=80=9CWHERE columnname =3D timestamp =E2=80=982015-07-23 10:30:00=E2=80= =99 =E2=80=9C it does not return any results.

=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 If we use =E2=80= =9CWHERE columnname =3D =E2=80=982015-07-23 10:30:00=E2=80=99 =E2=80=9C it = returns ERROR

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 If use timestamp column in projection list, it = truncates the timestamp part i.e. 2015-07-23 10:30:00 to 2015-07-23 00:00:0= 0

=C2=A0

2.=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 For = measures with distinct count, it uses approximations with certain error rat= es, lowest of which is <1.22%. Does this guarantee that counts would be = accurate ?

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 We have seen for a count of 1000 results as 982, 1= 000 etc.

=C2=A0

Thanks,

Chetan

=C2=A0


--001a1139daa618160805233b7dc1--