commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiaopaner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MATH-1367) DBSCAN Implementation does not count the seed point itself as part of its neighbors count
Date Fri, 12 Jan 2018 08:51:00 GMT

    [ https://issues.apache.org/jira/browse/MATH-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323717#comment-16323717
] 

jiaopaner commented on MATH-1367:
---------------------------------

Hello,Amol
    I quite agree with your opinion .I also found this problem when I use the DBSCAN algorithm.The
version I used is apache-commons -math-3.6.1.
So the problem still exists,and I also found other problems in addition to the above problem.The
time complexity of the algorithm is always n^2 when I test.Besides that,there are many no
necessary operations in the algorithm implementation.BTW,why only DoublePoint, no StringPoint,
so I can't use Levenshtein distance to search the adjacency points.I'll create a detailed
bug report .

> DBSCAN Implementation does not count the seed point itself as part of its neighbors count
> -----------------------------------------------------------------------------------------
>
>                 Key: MATH-1367
>                 URL: https://issues.apache.org/jira/browse/MATH-1367
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.6.1
>            Reporter: Amol Singh
>             Fix For: 4.0
>
>
> The DSCAN paper describes the eps-neighborhood of a point as 
> https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf (Page 2)
> Definition 1: (Eps-neighborhood of a point) The Eps-neighborhood of a point p, denoted
by NEps(p), is defined by NEps(p) = {q ∈ D | dist(p,q)< Eps} 
> in other words for all q points that are a member of database D whose distance from p
is less that Eps should be classified as a neighbor. This should include the point itself.

> The implementation however has a reference check to the point itself and does not add
it to its neighbors list.
> private List<T> getNeighbors(final T point, final Collection<T> points) {
>         final List<T> neighbors = new ArrayList<T>();
>         for (final T neighbor : points) {
>             if (point != neighbor && distance(neighbor, point) <= eps) {
>                 neighbors.add(neighbor);
>             }
>         }
>         return neighbors;
>     } 
> "point != neighbor "  check should be removed here. Keeping this check effectively is
raising the minPts count by 1. Other third party QuadTree backed DBSCAN implementations consider
the center point in its neighbor count E.g. bmw-carit library. 
> If this is infact by design, the check should use value equality instead of reference
equality. T extends Clusterable<T> , the client should be able to define this behavior.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message