accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cprigano <chris.p.rig...@gmail.com>
Subject Re: Accumulo iterator to return a random sample of a percentile of a table
Date Wed, 05 Feb 2014 03:39:50 GMT
Good questions all! I am to start trying to just take a percentile of rows
in a table similar to a percentile to construct training, cross-validation
and testing sets. I am a machine learning person and what to be able to do
say a 25% random sample of rows in a table ( I may not know the size and
the percentile should be settable) Starting with the easiest assumption,
that all row  are the say "type" will get things started. I can then move
to more exotic scenarios. Accumulo is a new nut for me to crack and I would
very much like your thoughts. Thanks mate!


On Tue, Feb 4, 2014 at 7:27 PM, Chris Bennight [via Apache Accumulo] <
ml-node+s1065345n7394h67@n5.nabble.com> wrote:

> I'm assuming you want a random selection of entries in accumulo - so say a
> random selection of key's/values?
>
> How are your keys formatted (conceptually is fine); is there some sort of
> regularity to them?  (I.e. can you calculate ahead of time a random
> distribution of keys without validating which keys are present)?
>
> If you can't calculate the key distribution ahead of time, are you keeping
> any statistics (or could you) on ingest (cardinality, distribution, etc.)
> -
> and finally, how rigorous and performant do you need this random sampling
> to be?  Do you just want representative data, or are you trying to do
> something like BlinkDB[1]  (allow people to specify confidence intervals
> on
> queries, and only sample enough data to meet the requisite uncertainty
> requirements)?
>
> [1] http://blinkdb.org/
>
> Chris
>
>
>
>
> On Sat, Feb 1, 2014 at 3:58 PM, cprigano <[hidden email]<http://user/SendEmail.jtp?type=node&node=7394&i=0>>
> wrote:
>
> > I am looking at writing an Accumulo iterator to return a random sample
> of a
> > percentile of a table.
> >
> > I would appreciate any suggestions.
> >
> > Thnaks,
> >
> > Chris
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-accumulo.1065345.n5.nabble.com/Accumulo-iterator-to-return-a-random-sample-of-a-percentile-of-a-table-tp7354.html
> > Sent from the Developers mailing list archive at Nabble.com.
> >
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-accumulo.1065345.n5.nabble.com/Accumulo-iterator-to-return-a-random-sample-of-a-percentile-of-a-table-tp7354p7394.html
>  To unsubscribe from Accumulo iterator to return a random sample of a
> percentile of a table, click here<http://apache-accumulo.1065345.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=7354&code=Y2hyaXMucC5yaWdhbm9AZ21haWwuY29tfDczNTR8NTkyODE0MjEy>
> .
> NAML<http://apache-accumulo.1065345.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Accumulo-iterator-to-return-a-random-sample-of-a-percentile-of-a-table-tp7354p7400.html
Sent from the Developers mailing list archive at Nabble.com.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message