phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ethan Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-4724) Efficient Equi-Depth histogram for streaming data
Date Fri, 04 May 2018 05:26:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463382#comment-16463382
] 

Ethan Wang commented on PHOENIX-4724:
-------------------------------------

[~vincentpoon]

If I understand correctly, with this feature implemented, when you build index table, you
will at same time record some info into this histogram, so that in the future at some point
you will conveniently get the distribution info of the index table. correct?

So do you store a histogram obj for each index table like a shadow obj some where off line?
Also, will there every be case that you need mutate index or remove index from a existing
index table?

Cool idea!

> Efficient Equi-Depth histogram for streaming data
> -------------------------------------------------
>
>                 Key: PHOENIX-4724
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4724
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Vincent Poon
>            Assignee: Vincent Poon
>            Priority: Major
>         Attachments: PHOENIX-4724.v1.patch
>
>
> Equi-Depth histogram from http://web.cs.ucla.edu/~zaniolo/papers/Histogram-EDBT2011-CamReady.pdf,
but without the sliding window - we assume a single window over the entire data set.
> Used to generate the bucket boundaries of a histogram where each bucket has the same
# of items.
> This is useful, for example, for pre-splitting an index table, by feeding in data from
the indexed column.
> Works on streaming data - the histogram is dynamically updated for each new value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message