Hi
I'd asked the same question before, so I might be the best person to answer
this question. Also, I implemented DBSCAN.
You can do similarity with spectral clustering because spectral clustering
is based on similarity value but the strength of DBSCAN is we don't need to
specify the number of cluster and we don't have to figure out prior
information of data distribution, which does not shine with spectral
clustering.
DBSCAN needs to build the distance matrix for a set of elements. If your
dataset is high dimensional, computational cost for building the distance
matrix would be O(N^2). Otherwise, you can use the spatial index such as
KDtree to mitigate computational cost for building the distance matrix.
After you build the distance matrix, you can easily implement expand
cluster of DBSCAN algorithm.
You can use Hadoop MapReduce for building the distance matrix. Also, if the
distance matrix is too big to be loaded into the memory or you don't want
to use either diskbased search structure or another remote static
resource, you can replace unionfind algorithm with expand cluster of
DBSCAN. Implementing unionfind algorithm with MapReduce is not so
difficult.
Best, Jae
On Tue, Jan 31, 2012 at 7:34 PM, Vikas Pandya <vikasdp@yahoo.com> wrote:
> Hello,
>
> Does anybody know if there are any plans of including DBScan in Mahout?
> for that matter of fact, any density based algorithms in mahout?
>
