accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-4730) Create an Entry length summarizer
Date Tue, 24 Oct 2017 15:29:00 GMT


Keith Turner commented on ACCUMULO-4730:

Great [~jkrdev]!  I added you to the contributors group for Accumulo in Jira and assigned
the issue to you.  Let me know if you have any questions.

> Create an Entry length summarizer
> ---------------------------------
>                 Key: ACCUMULO-4730
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Keith Turner
>            Assignee: Jared R
>              Labels: newbie
>             Fix For: 2.0.0
> It would be very useful to have a built in [Summarizer|]
that computes summary information about field lengths.  Specifically key length, row length,
family length, qualifier length, visibility length, and value length.   Whatever stats are
computed must be able to computed incrementally.  For example can incrementally compute min,
max, count, sum, and log2 histogram.  I think these would be good stats to start with.  Count
and sum can be used to compute the average.  There is an example of computing a log2 histogram
in the Summarizer javadoc.
> The Summarizer could be named EntryLenghtSummarizer and possibly produce summaries like
the following.  
> {noformat}
> count=XXX     //do not need to track this per field, its the same for all
> key.min=XXX
> key.max=XXX
> key.sum=XXX
> key.logHist.8=XXX   //only output non zero exponents 
> key.logHist.9=XXX
> row.min=XXX
> row.max=XXX
> row.sum=XXX
> row.logHist.7=XXX
> row.logHist.8=XXX
> row.logHist.10=XXX
> family.min=XXX
> family.max=XXX
> family.sum=XXX
> family.logHist.6=XXX
> family.logHist.7=XXX
> etc...
> {noformat}
> This new summarizer would be placed in the [summarizers|]

This message was sent by Atlassian JIRA

View raw message