phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julian Hyde (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-3390) Custom UDAF for HyperLogLogPlus
Date Sun, 25 Jun 2017 22:29:00 GMT


Julian Hyde commented on PHOENIX-3390:

[~gjacoby], Thanks for the heads up. Even more pertinent than my data profiling work (which
is about the problem of computing 2^n approximate distinct-counts simultaneously, see CALCITE-1616)
is people's requirement to have fast, approximate distinct count. Druid supports various sketches,
and we wish to surface them in Calcite's Druid adapter (see CALCITE-1787 theta-sketch, CALCITE-1587
top-N, CALCITE-1853 knowing when approximate count-distinct is acceptable).

Today many databases have a syntax for approximate aggregates, and unfortunately the syntaxes
are rarely the same and are often too closely coupled to a particular algorithm (e.g HyperLogLog).
I have logged CALCITE-1588 to introduce an {{APPROXIMATE}} clause, e.g. {{COUNT(DISTINCT customerId)
APPROXIMATE (WITHIN 10 PERCENT))}}. It would be great if Phoenix wants to go with that syntax.

> Custom UDAF for HyperLogLogPlus
> -------------------------------
>                 Key: PHOENIX-3390
>                 URL:
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Swapna Kasula
>            Assignee: Ethan Wang
>            Priority: Minor
> With ref # PHOENIX-2069
> Custome UDAF to aggregate/union of Hyperloglog's of a column and returns a Hyperloglog.
> select hllUnion(col1) from table;  //returns a Hyperloglog, which is the union of all
hyperloglog's from all rows for column 'col1'

This message was sent by Atlassian JIRA

View raw message