accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4500) Implement visibility histograms as a table feature
Date Wed, 19 Oct 2016 14:12:58 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588881#comment-15588881
] 

Josh Elser commented on ACCUMULO-4500:
--------------------------------------

bq. we should limit the feature to be a mapping of names (type: String) to counts (type: signed
Long)

:coughs: That... is a histogram. Label (string) on x-axis, numeric value on the y-axis. What
is this aversion to using the word "histogram"? I honestly don't understand the desire to
avoid using that word.

bq. I think it would be a good idea that when this information is exposed in the client API,
it should be retrievable through a user-supplied aggregation/combiner function

Can you sketch what you think this should look like? It's not entirely clear to me.

bq. The reasoning for this is that client code doesn't normally deal with things at the granularity
of files, but rather, the granularity of tablets, ranges, and tables

Keith had made a good point elsewhere that, when files overlap tablets, we might double-count.
That, combined with the counting (presently) not included on the IMM, these "counters" (winks)
would not be 100% accurate. Personally, I think that's OK.

bq. A summation function would probably be the most common, but certainly not the only useful
aggregation function.

Example of another aggregation function?

> Implement visibility histograms as a table feature
> --------------------------------------------------
>
>                 Key: ACCUMULO-4500
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4500
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client, tserver
>            Reporter: Josh Elser
>
> Add support to quickly extract a histogram of all of the visibilities stored in an Accumulo
table.
> DISCUSS: https://lists.apache.org/thread.html/df5e764362a95277344fd2731a432e9fafc60595e7d30015d9a56b9c@%3Cdev.accumulo.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message