spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: distributed computation of median
Date Mon, 17 Apr 2017 16:26:59 GMT
The DataFrame API includes an approximate quartile implementation. If you
ask for quantile 0.5, you will get approximate median.


On Sun, Apr 16, 2017 at 9:24 PM svjk24 <svjk24@gmail.com> wrote:

> Hello,
>   Is there any interest in an efficient distributed computation of the
> median algorithm?
> A google search pulls some stackoverflow discussion but it would be good
> to have one provided.
>
> I have an implementation (that could be improved)
> from the paper " Fast Computation of the Median by Successive Binning":
>
> https://github.com/4d55397500/medianbinning
>
> Thanks-
>
>
>
>
>

Mime
View raw message