Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 747E118414 for ; Tue, 15 Dec 2015 09:39:52 +0000 (UTC) Received: (qmail 21051 invoked by uid 500); 15 Dec 2015 09:39:51 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 20952 invoked by uid 500); 15 Dec 2015 09:39:50 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 20942 invoked by uid 99); 15 Dec 2015 09:39:50 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Dec 2015 09:39:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 685D518028D for ; Tue, 15 Dec 2015 09:39:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 5.314 X-Spam-Level: ***** X-Spam-Status: No, score=5.314 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, KAM_LAZY_DOMAIN_SECURITY=1, URIBL_BLOCKED=0.001, URI_HEX=1.313] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id g3qrvyQkMidM for ; Tue, 15 Dec 2015 09:39:37 +0000 (UTC) Received: from gwo5.mbox.net (gwo5.mbox.net [165.212.64.23]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 7B801429C4 for ; Tue, 15 Dec 2015 09:39:37 +0000 (UTC) Received: from gwo5.mbox.net (localhost [127.0.0.1]) by gwo5.mbox.net (Postfix) with ESMTP id 3pKZLB6bQHzlh27T for ; Tue, 15 Dec 2015 09:39:30 +0000 (UTC) X-USANET-Received: from gwo5.mbox.net [127.0.0.1] by gwo5.mbox.net via mtad (C8.MAIN.4.02J) with ESMTP id 596TLoJny0048Mo5; Tue, 15 Dec 2015 09:39:25 -0000 X-USANET-Routed: 5 gwsout-gwsd Q:gwsd X-USANET-Routed: 3 gwsout-vs Q:bmvirus X-USANET-GWS2-Tenant: baesystems.com X-USANET-GWS2-Tagid: BAES Received: from UKDC1DMZEXEV01 [206.142.223.10] by gwo5.mbox.net via smtad (C8.MAIN.4.04I) with ESMTPS id XID451TLoJnZ6918Xo5; Tue, 15 Dec 2015 09:39:25 -0000 X-USANET-Source: 206.142.223.10 OUT peter.marron@baesystems.com UKDC1DMZEXEV01 TLS X-USANET-MsgId: XID451TLoJnZ6918Xo5 Received: from UKDC1CSUEXCV02.CSU.LOCAL (10.210.13.4) by UKDC1DMZEXEV01.ukdmz.local (10.210.195.10) with Microsoft SMTP Server id 14.3.248.2; Tue, 15 Dec 2015 09:38:54 +0000 Received: from UKDC1CSUEXMV02.CSU.LOCAL ([169.254.2.160]) by UKDC1CSUEXCV02.CSU.LOCAL ([10.210.13.4]) with mapi id 14.03.0248.002; Tue, 15 Dec 2015 09:39:22 +0000 From: "peter.marron@baesystems.com" To: "user@hive.apache.org" Subject: Table statistics Thread-Topic: Table statistics Thread-Index: AdE3EyhkqoJEc7WjSZOJcNe8cepA4A== Date: Tue, 15 Dec 2015 09:39:23 +0000 Message-ID: <98E87BD7CABA6E439DBF35124E8F4061A28F9328@UKDC1CSUEXMV02.CSU.LOCAL> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.20.8.176] Content-Type: multipart/alternative; boundary="_000_98E87BD7CABA6E439DBF35124E8F4061A28F9328UKDC1CSUEXMV02C_" MIME-Version: 1.0 Received-SPF: None (UKDC1DMZEXEV01.ukdmz.local: peter.marron@baesystems.com does not designate permitted sender hosts) --_000_98E87BD7CABA6E439DBF35124E8F4061A28F9328UKDC1CSUEXMV02C_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, I was wondering if there is any "recognized" way to obtain table statistics= . Ideally, given a Key range I would like to know the number of distinct rowi= ds, entries and amount of data (in bytes) in that key range. I assume that Accumulo holds at least some of this information internally, = partly because I can see some of this through the monitor, and partly because it must know something about the qu= antity of data held in order to be able to implement the table threshold. In my case the tables are very static and so the "estimates" that the monit= or has are likely to sufficiently accurate for my purposes. I have found this link http://apache-accumulo.1065345.n5.nabble.com/Determining-tablets-assigned-t= o-table-splits-and-the-number-of-rows-in-each-tablet-td11546.html which describes a process (which I haven't tried yet) to get the number of = entries in a range. Which would probably be sufficient for me and would certainly be a good sta= rt. However it seems to be using internal data structures and non-published API= s, which is less than ideal. And it seems to be written against Accumulo version 1.6. I'm using Accumulo 1.7. Is there anything better than I can do or is it rec= ommended that this is the way to go? Regards, Z Please consider the environment before printing this email. This message sh= ould be regarded as confidential. If you have received this email in error = please notify the sender and destroy it immediately. Statements of intent s= hall only become binding when confirmed in hard copy by an authorised signa= tory. The contents of this email may relate to dealings with other companie= s under the control of BAE Systems Applied Intelligence Limited, details of= which can be found at http://www.baesystems.com/Businesses/index.htm. --_000_98E87BD7CABA6E439DBF35124E8F4061A28F9328UKDC1CSUEXMV02C_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi,

 

I was wondering if there is any “recognized= 221; way to obtain table statistics.

Ideally, given a Key range I would like to know the = number of distinct rowids, entries and amount of data (in bytes) in that ke= y range.

I assume that Accumulo holds at least some of this i= nformation internally, partly because I can see some of this

through the monitor, and partly because it must know= something about the quantity of data held in order to be able

to implement the table threshold.

 

In my case the tables are very static and so the = 220;estimates” that the monitor has are likely to sufficiently accura= te for my purposes.

 

I have found this link

http://apache-accumulo.1065345.n5.nabble.com/Dete= rmining-tablets-assigned-to-table-splits-and-the-number-of-rows-in-each-tab= let-td11546.html

which describes a process (which I haven’t tri= ed yet) to get the number of entries in a range.

Which would probably be sufficient for me and would = certainly be a good start.

However it seems to be using internal data structure= s and non-published APIs, which is less than ideal.

And it seems to be written against Accumulo version = 1.6.

 

I’m using Accumulo 1.7. Is there anything bett= er than I can do or is it recommended that this is the way to go?

 

Regards,

 

Z

Please consider the environment before printing this email. This message sh= ould be regarded as confidential. If you have received this email in error = please notify the sender and destroy it immediately. Statements of intent s= hall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may re= late to dealings with other companies under the control of BAE Systems Appl= ied Intelligence Limited, details of which can be found at http://www.baesy= stems.com/Businesses/index.htm. --_000_98E87BD7CABA6E439DBF35124E8F4061A28F9328UKDC1CSUEXMV02C_--