accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4500) Implement visibility histograms as a table feature
Date Wed, 19 Oct 2016 15:27:58 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589044#comment-15589044
] 

Christopher Tubbs commented on ACCUMULO-4500:
---------------------------------------------

bq. :coughs: That... is a histogram.

I disagree. A histogram is a diagram... a particular kind of data visualization. Computer
scientists overload the term to mean to mean the data structure to power such a visualization,
and that's fine, but that's not even what we have here. A histogram has particular semantics
where different values of a single variable are on one axis, and the frequency of those values
are shown on the other. What we have here is more general than even that, because we're not
necessarily referring to a single variable, nor do the magnitudes necessarily represent frequencies
or anything like frequencies. Calling it a histogram implies semantics we don't necessarily
need to impose.

bq. Can you sketch what you think this should look like? It's not entirely clear to me.

Something like:

{code:java}
public NamedCounters getCounters(Range range, Function<Long, Long> combiner);
{code}

bq. Keith had made a good point elsewhere that, when files overlap tablets, we might double-count.
That, combined with the counting (presently) not included on the IMM, these "counters" (winks)
would not be 100% accurate. Personally, I think that's OK.

I agree.

bq. Example of another aggregation function?

Here's some: https://en.wikipedia.org/wiki/Aggregate_function
I could think of others.


> Implement visibility histograms as a table feature
> --------------------------------------------------
>
>                 Key: ACCUMULO-4500
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4500
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client, tserver
>            Reporter: Josh Elser
>
> Add support to quickly extract a histogram of all of the visibilities stored in an Accumulo
table.
> DISCUSS: https://lists.apache.org/thread.html/df5e764362a95277344fd2731a432e9fafc60595e7d30015d9a56b9c@%3Cdev.accumulo.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message