accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Bennight <>
Subject Re: Accumulo iterator to return a random sample of a percentile of a table
Date Wed, 05 Feb 2014 01:26:50 GMT
I'm assuming you want a random selection of entries in accumulo - so say a
random selection of key's/values?

How are your keys formatted (conceptually is fine); is there some sort of
regularity to them?  (I.e. can you calculate ahead of time a random
distribution of keys without validating which keys are present)?

If you can't calculate the key distribution ahead of time, are you keeping
any statistics (or could you) on ingest (cardinality, distribution, etc.) -
and finally, how rigorous and performant do you need this random sampling
to be?  Do you just want representative data, or are you trying to do
something like BlinkDB[1]  (allow people to specify confidence intervals on
queries, and only sample enough data to meet the requisite uncertainty



On Sat, Feb 1, 2014 at 3:58 PM, cprigano <> wrote:

> I am looking at writing an Accumulo iterator to return a random sample of a
> percentile of a table.
> I would appreciate any suggestions.
> Thnaks,
> Chris
> --
> View this message in context:
> Sent from the Developers mailing list archive at

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message