kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhong Yanghong (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KYLIN-3487) Create a new measure for count distinct
Date Fri, 10 Aug 2018 07:28:00 GMT
Zhong Yanghong created KYLIN-3487:
-------------------------------------

             Summary: Create a new measure for count distinct
                 Key: KYLIN-3487
                 URL: https://issues.apache.org/jira/browse/KYLIN-3487
             Project: Kylin
          Issue Type: Improvement
            Reporter: Zhong Yanghong
            Assignee: Zhong Yanghong


In eBay, there's a requirement to calculate the count distinct of sessions. Each day there'll
be 20M sessions. For deep dive, users want to get the session cardinality in a month, or even
several months. If just for one month, the total cardinality will be around 20M*30


To calculate the count distinct of session, if a session never crosses days, it's meaningless
to merge the related counter, bitmap or hll, across days.


For count distinct session, it's meaningless to merge across days, for session is never across
days. Therefore, we may need a new measure containing a map, using the date info as the key,
and using bitmap or hll as the value. When calculating count distinct, it's only need to get
the state for each key-value entry and then to summarize the states. And we don't need merge
bitmap or hll across different key-value entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message