Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D0AA51117D for ; Fri, 27 Jun 2014 15:06:26 +0000 (UTC) Received: (qmail 79192 invoked by uid 500); 27 Jun 2014 15:06:26 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 79137 invoked by uid 500); 27 Jun 2014 15:06:26 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 79127 invoked by uid 99); 27 Jun 2014 15:06:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jun 2014 15:06:26 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.220.171] (HELO mail-vc0-f171.google.com) (209.85.220.171) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jun 2014 15:06:24 +0000 Received: by mail-vc0-f171.google.com with SMTP id id10so5242713vcb.30 for ; Fri, 27 Jun 2014 08:05:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=w6CVTU3Pta6WPX5ummHmRpqMcKu/3HRPwf8BXJaxno0=; b=VhNE+JwVkA6ZFCbjRYgsgNTZ5WiUVlZwSOrhP3E3+frSnL674gbcfeOqeSKXTl6t8d US7/cPEPcGzDo0O6obnKMovBYw7NqlUzPyG22mjxoD+Xi4L7OJ7Drv8Yxt6mmU4vq3oq 7tE230t9N62OwSHXoZ8leGnCJfESyrOTx3R4aTMbQItJP1MCM6i9MZZElFezOiGqntm1 dbxpsoIvL8WiFFV4tDTF8+IfwoX7xq6M6gtBtfbE0KSW8XpGcnj4OCrAwvXiX+lrYqbr /18sVcIuulVPgUROo3OWDeNHLFDLVLr2ypRpcsb9L7MIeEvUfjE0jXi+CVTdqmoAWAIL 53cQ== X-Gm-Message-State: ALoCoQmu0dHOIPfTt60NHGte9SGL25BOMn5jI5is0t9ZQlBWZGmuB9+EaFlsdMsMHNAmfl9aEdU2 X-Received: by 10.52.246.101 with SMTP id xv5mr1137509vdc.73.1403881559237; Fri, 27 Jun 2014 08:05:59 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.225.72 with HTTP; Fri, 27 Jun 2014 08:05:39 -0700 (PDT) In-Reply-To: References: From: Jamie Stephens Date: Fri, 27 Jun 2014 10:05:39 -0500 Message-ID: Subject: Re: Scanner.estimatedCount()? To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=001a1136028ca53dd504fcd2a555 X-Virus-Checked: Checked by ClamAV on apache.org --001a1136028ca53dd504fcd2a555 Content-Type: text/plain; charset=UTF-8 Eric, Thanks. Yeah, it's pretty easy to sample during ingest. That's probably what I'll do. In the past, I've also done the traditional batch statistics generation. Would be easy here with MapReduce+combiner. --Jamie On Fri, Jun 27, 2014 at 9:40 AM, Eric Newton wrote: > Short answer: no. > > Long answer: > > You can scan the metadata table for the count/size of the files. > > You can query tablet servers for the basic stats of every tablet for a > given table. This is used for balancing. > > But really you should collect the statistics you want during ingest and > insert them in another table. > > -Eric > > > On Fri, Jun 27, 2014 at 9:42 AM, Jamie Stephens wrote: > >> Is there a way to get a quick estimate of the number of keys in a given >> range? >> >> Perhaps more generally, getting an estimate of the amount of work (and >> even some sort of confidence based on, say, the age of something) to >> iterate over a range. >> >> I'd like to do some query planning, so statistics like these sure would >> be nice. >> >> --Jamie >> >> > --001a1136028ca53dd504fcd2a555 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Eric,

Thanks.=C2=A0 Yeah, it's = pretty easy to sample during ingest.=C2=A0 That's probably what I'l= l do.=C2=A0 In the past, I've also done the traditional batch statistic= s generation.=C2=A0 Would be easy here with MapReduce+combiner.

--Jamie



On Fri, Jun 27, 2014 at 9:40 AM, Eric Newton <eric.n= ewton@gmail.com> wrote:
Sh= ort answer: no.

Long answer:

You can scan the met= adata table for the count/size of the files.

You can query tablet servers for the basic stats of every tablet = for a given table.=C2=A0 This is used for balancing.

But really you should collect the statistics you want during inge= st and insert them in another table.

-Eric


On Fri, Jun 27, 2014 at 9:42= AM, Jamie Stephens <js@morphism.com> wrote:
Is there a way to= get a quick estimate of the number of keys in a given range?

=
Perhaps more generally, getting an estimate of the amount of work (and even=20 some sort of confidence based on, say, the age of something) to iterate=20 over a range.

I'd like to do some query planning, so statistics like these sure w= ould be nice.

--Jamie



--001a1136028ca53dd504fcd2a555--