accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Scanner.estimatedCount()?
Date Fri, 27 Jun 2014 15:04:12 GMT
You could do this fairly efficiently by leveraging the CountingIterator 
to get an exact count (taking visibilities into account, as well) for 
the range in question. It isn't going to be as fast as a precomputed 
answer, but you could cache that easily.

The fact that visibilities will affect the cardinality of a term makes 
it harder for us to provide this within Accumulo. The situations where 
Accumulo itself cares about cardinality, it's agnostic of the 
visibilities. It would be possible to try to build an index of this 
information internally, but, like Eric said, that's not there today.

On 6/27/14, 10:40 AM, Eric Newton wrote:
> Short answer: no.
> Long answer:
> You can scan the metadata table for the count/size of the files.
> You can query tablet servers for the basic stats of every tablet for a
> given table.  This is used for balancing.
> But really you should collect the statistics you want during ingest and
> insert them in another table.
> -Eric
> On Fri, Jun 27, 2014 at 9:42 AM, Jamie Stephens <
> <>> wrote:
>     Is there a way to get a quick estimate of the number of keys in a
>     given range?
>     Perhaps more generally, getting an estimate of the amount of work
>     (and even some sort of confidence based on, say, the age of
>     something) to iterate over a range.
>     I'd like to do some query planning, so statistics like these sure
>     would be nice.
>     --Jamie

View raw message