kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prashant Prakash <prash.i...@gmail.com>
Subject HyperLogLogPlusCounter Usage
Date Mon, 11 Jan 2016 19:49:26 GMT
Hi,

I am experiencing strange issue with count(distinct) query in kylin. We are
using hllc12 for evaluating uniques for a measure in a table partitioned
over date.
The uniques estimate for individual dates 2016-01-07, 2016-01-08,
2016-01-09 are 93,728,324, 90,982,364, 45,485,278 respectively.
But the uniques across days, which is calculated through
HyperLogLogPlusCounter.merge operation gives a value 67,980,576.

1. Is the query for distinct across days a valid usage for kylin ?

Sample query:
SELECT COUNT(DISTINCT f.userid) AS m1 FROM
kylin.fact_publishers_uniques f WHERE
dt in ('2016-01-09', '2016-01-08', '2016-01-07')

Theoretically the lower bound for uniques across days should at least be
the maximum of uniques for each day, the final number does not seems
correct.
To debug the issue we also calculated uniques across  2016-01-07,
2016-01-08. It accounts to 164,637,916. Its only when we merge data for
2016-01-09 we get spurious value.

2. Is there any limit on the relative order elements being merged ?

Regards,
Prashant

Mime
View raw message