accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <david.medin...@gmail.com>
Subject Re: Scanner.estimatedCount()?
Date Fri, 27 Jun 2014 16:48:33 GMT
https://github.com/medined/D4M_Schema/blob/master/schema/src/main/java/com/codebits/d4m/ingest/MutationFactory.java
shows how to use a HyperLogLog object to track cardinality during
ingest.

On Fri, Jun 27, 2014 at 11:05 AM, Jamie Stephens <js@morphism.com> wrote:
> Eric,
>
> Thanks.  Yeah, it's pretty easy to sample during ingest.  That's probably
> what I'll do.  In the past, I've also done the traditional batch statistics
> generation.  Would be easy here with MapReduce+combiner.
>
> --Jamie
>
>
>
> On Fri, Jun 27, 2014 at 9:40 AM, Eric Newton <eric.newton@gmail.com> wrote:
>>
>> Short answer: no.
>>
>> Long answer:
>>
>> You can scan the metadata table for the count/size of the files.
>>
>> You can query tablet servers for the basic stats of every tablet for a
>> given table.  This is used for balancing.
>>
>> But really you should collect the statistics you want during ingest and
>> insert them in another table.
>>
>> -Eric
>>
>>
>> On Fri, Jun 27, 2014 at 9:42 AM, Jamie Stephens <js@morphism.com> wrote:
>>>
>>> Is there a way to get a quick estimate of the number of keys in a given
>>> range?
>>>
>>> Perhaps more generally, getting an estimate of the amount of work (and
>>> even some sort of confidence based on, say, the age of something) to iterate
>>> over a range.
>>>
>>> I'd like to do some query planning, so statistics like these sure would
>>> be nice.
>>>
>>> --Jamie
>>>
>>
>

Mime
View raw message