phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vincent Poon (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-4724) Efficient Equi-Depth histogram for streaming data
Date Fri, 04 May 2018 17:27:00 GMT


Vincent Poon commented on PHOENIX-4724:

[~aertoria] yea, the idea is when someone wants to build an index table, we sample the data
table for some index column values, and put it into this histogram.  Since this is in equi-depth
histogram, each bucket will have the same number of elements.  So we get the bucket bounds
from the histogram and use those to pre-split the index table.

The histogram is relatively small, so we can keep it in memory, and perhaps save it for each
table as you suggest.   It could then perhaps be used for query optimization or approximate
count queries.

> Efficient Equi-Depth histogram for streaming data
> -------------------------------------------------
>                 Key: PHOENIX-4724
>                 URL:
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Vincent Poon
>            Assignee: Vincent Poon
>            Priority: Major
>         Attachments: PHOENIX-4724.v1.patch
> Equi-Depth histogram from,
but without the sliding window - we assume a single window over the entire data set.
> Used to generate the bucket boundaries of a histogram where each bucket has the same
# of items.
> This is useful, for example, for pre-splitting an index table, by feeding in data from
the indexed column.
> Works on streaming data - the histogram is dynamically updated for each new value.

This message was sent by Atlassian JIRA

View raw message