Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CEE23200D33 for ; Tue, 24 Oct 2017 17:49:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id CDD33160BDB; Tue, 24 Oct 2017 15:49:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 22516160BF1 for ; Tue, 24 Oct 2017 17:49:04 +0200 (CEST) Received: (qmail 77555 invoked by uid 500); 24 Oct 2017 15:49:04 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 77486 invoked by uid 99); 24 Oct 2017 15:49:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Oct 2017 15:49:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 5B8511A11E5 for ; Tue, 24 Oct 2017 15:49:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.001 X-Spam-Level: X-Spam-Status: No, score=-100.001 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id R3krps2v03H2 for ; Tue, 24 Oct 2017 15:49:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id A41D05F3E1 for ; Tue, 24 Oct 2017 15:49:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id D3BCEE0F7D for ; Tue, 24 Oct 2017 15:49:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 3FAC7212FD for ; Tue, 24 Oct 2017 15:49:00 +0000 (UTC) Date: Tue, 24 Oct 2017 15:49:00 +0000 (UTC) From: "Jared R (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-4730) Create an Entry length summarizer MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 24 Oct 2017 15:49:06 -0000 [ https://issues.apache.org/jira/browse/ACCUMULO-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16217130#comment-16217130 ] Jared R commented on ACCUMULO-4730: ----------------------------------- Thanks! Will do > Create an Entry length summarizer > --------------------------------- > > Key: ACCUMULO-4730 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4730 > Project: Accumulo > Issue Type: Improvement > Reporter: Keith Turner > Assignee: Jared R > Labels: newbie > Fix For: 2.0.0 > > > It would be very useful to have a built in [Summarizer|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/summary/Summarizer.java] that computes summary information about field lengths. Specifically key length, row length, family length, qualifier length, visibility length, and value length. Whatever stats are computed must be able to computed incrementally. For example can incrementally compute min, max, count, sum, and log2 histogram. I think these would be good stats to start with. Count and sum can be used to compute the average. There is an example of computing a log2 histogram in the Summarizer javadoc. > The Summarizer could be named EntryLenghtSummarizer and possibly produce summaries like the following. > {noformat} > count=XXX //do not need to track this per field, its the same for all > key.min=XXX > key.max=XXX > key.sum=XXX > key.logHist.8=XXX //only output non zero exponents > key.logHist.9=XXX > row.min=XXX > row.max=XXX > row.sum=XXX > row.logHist.7=XXX > row.logHist.8=XXX > row.logHist.10=XXX > family.min=XXX > family.max=XXX > family.sum=XXX > family.logHist.6=XXX > family.logHist.7=XXX > etc... > {noformat} > This new summarizer would be placed in the [summarizers|https://github.com/apache/accumulo/tree/master/core/src/main/java/org/apache/accumulo/core/client/summary/summarizers] package. -- This message was sent by Atlassian JIRA (v6.4.14#64029)