accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ACCUMULO-4730) Create an Entry length summarizer
Date Tue, 24 Oct 2017 15:12:00 GMT
Keith Turner created ACCUMULO-4730:
--------------------------------------

             Summary: Create an Entry length summarizer
                 Key: ACCUMULO-4730
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4730
             Project: Accumulo
          Issue Type: Improvement
            Reporter: Keith Turner
             Fix For: 2.0.0


It would be very useful to have a built in [Summarizer|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/summary/Summarizer.java]
that computes summary information about field lengths.  Specifically key length, row length,
family length, qualifier length, visibility length, and value length.   Whatever stats are
computed must be able to computed incrementally.  For example can incrementally compute min,
max, count, sum, and log2 histogram.  I think these would be good stats to start with.  Count
and sum can be used to compute the average.  There is an example of computing a log2 histogram
in the Summarizer javadoc.

The Summarizer could be named EntryLenghtSummarizer and possibly produce summaries like the
following.  

{noformat}
count=XXX     //do not need to track this per field, its the same for all
key.min=XXX
key.max=XXX
key.sum=XXX
key.logHist.8=XXX   //only output non zero exponents 
key.logHist.9=XXX
row.min=XXX
row.max=XXX
row.sum=XXX
row.logHist.7=XXX
row.logHist.8=XXX
row.logHist.10=XXX
family.min=XXX
family.max=XXX
family.sum=XXX
family.logHist.6=XXX
family.logHist.7=XXX
etc...
{noformat}

This new summarizer would be placed in the [summarizers|https://github.com/apache/accumulo/tree/master/core/src/main/java/org/apache/accumulo/core/client/summary/summarizers]
package.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message