spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject Re: Spark streaming quantile?
Date Mon, 09 Dec 2013 08:05:43 GMT
Thanks all for the suggestions.  Exactly what I was looking for.

-Sandy


On Thu, Dec 5, 2013 at 5:00 AM, Sam Bessalah <samkiller@gmail.com> wrote:

> Just as stated before Algebird has many data structure to compute those
> like QTree, or Ted's tvdigest . Or you can look at stream-lib q digest
> https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/quantile/QDigest.java
> Or another one Frugal Streaming well described and with an implementation
> on the AK blog
>
> http://blog.aggregateknowledge.com/2013/09/16/sketch-of-the-day-frugal-streaming/
> There are some example in the Spark streaming sample on how to integrate
> algebird .
> Sam Bessalah
>
> > On Dec 5, 2013, at 5:41 AM, Ryan Weald <ryan@weald.com> wrote:
> >
> > Hi Sandy,
> > You could take a look at using the Q-Tree data structure that is provided
> > by Twitter's Algebird<
> https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/QTree.scala
> >.
> > Due to the associative properties of Algebird's SemiGroup it is ideally
> > suited for streaming computations.
> >
> > -Ryan
> >
> >
> >> On Wed, Dec 4, 2013 at 8:32 PM, Sandy Ryza <sandy.ryza@cloudera.com>
> wrote:
> >>
> >> Hi All,
> >>
> >> We're working on a Spark application that could make use of a computing
> >> quantiles in a streaming fashion.  Something in the vein of what DataFu
> has
> >> for Pig
> >>
> >>
> http://linkedin.github.io/datafu/docs/current/datafu/pig/stats/StreamingQuantile.html
> >> .
> >>
> >> Does anything like this exist in the Spark ecosystem?  If not, would
> there
> >> be a good place to contribute this if we write it?
> >>
> >> thanks,
> >> Sandy
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message