spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <>
Subject Re: [MLlib] Contributing Algorithm for Outlier Detection
Date Tue, 21 Oct 2014 16:40:00 GMT
Hi Ashutosh,

The process you described is correct, with details documented in
. There is no outlier detection algorithm in MLlib. Before you start
coding, please open an JIRA and let's discuss which algorithms are
appropriate to include, because there are many outlier detection
algorithms. I'm not sure which one is general enough and easy to
implement in parallel. For example, I'm not familiar with the
algorithm you mentioned, while the one I'm familiar with is based on
leverage scores:


On Tue, Oct 21, 2014 at 2:23 AM, Ashutosh <> wrote:
> Hi,
> I am new to Apache Spark (any open source project). I want to contribute to
> it. I found that MLlib has no algorithm for outlier detection yet.  By
> literature review I found the algorithm Attribute Value Frequency (AVF) is
> promising. Here is the link  DOI: 10.1109/ICTAI.2007.125
> By following the process I figured out that, I have to open a new feature
> request at JIRA ( Also, I have
> checked that no other issue is opened on "outlier detection".
> I want to know is it the right way to go? What project owners have in mind
> about outlier detection? Also is anybody working on parallel K nearest
> neighbour?
> Apart from opening up the feature request then pull request from git, How to
> provide the test cases?
> Suggestions and guidance are welcome.
> Thanks,
> Ashutosh
> --
> View this message in context:
> Sent from the Apache Spark Developers List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message