[ https://issues.apache.org/jira/browse/MATH1367?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=15294842#comment15294842
]
Gilles commented on MATH1367:

Hi Amol.
Thanks for your report.
>From reading the problem description it is not obvious to figure out whether your proposed
fix won't have any unwanted sideeffects (e.g. it could be that the "getNeighbors" was not
meant to exactly match the definition in the reference you cite; maybe you are totally right,
I'm just guessing since I never looked at that code...).
Could you provide a unit test showing that it is indeed a bug (i.e. a case where the best
solution cannot be recovered unless the fix is applied)?
> DBSCAN Implementation does not count the seed point itself as part of its neighbors count
> 
>
> Key: MATH1367
> URL: https://issues.apache.org/jira/browse/MATH1367
> Project: Commons Math
> Issue Type: Bug
> Affects Versions: 3.6.1
> Reporter: Amol Singh
> Fix For: 4.0
>
>
> The DSCAN paper describes the epsneighborhood of a point as
> https://www.aaai.org/Papers/KDD/1996/KDD96037.pdf (Page 2)
> Definition 1: (Epsneighborhood of a point) The Epsneighborhood of a point p, denoted
by NEps(p), is defined by NEps(p) = {q ∈ D  dist(p,q)< Eps}
> in other words for all q points that are a member of database D whose distance from p
is less that Eps should be classified as a neighbor. This should include the point itself.
> The implementation however has a reference check to the point itself and does not add
it to its neighbors list.
> private List<T> getNeighbors(final T point, final Collection<T> points) {
> final List<T> neighbors = new ArrayList<T>();
> for (final T neighbor : points) {
> if (point != neighbor && distance(neighbor, point) <= eps) {
> neighbors.add(neighbor);
> }
> }
> return neighbors;
> }
> "point != neighbor " check should be removed here. Keeping this check effectively is
raising the minPts count by 1. Other third party QuadTree backed DBSCAN implementations consider
the center point in its neighbor count E.g. bmwcarit library.
> If this is infact by design, the check should use value equality instead of reference
equality. T extends Clusterable<T> , the client should be able to define this behavior.

This message was sent by Atlassian JIRA
(v6.3.4#6332)
