mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 3316 Chirag Nagpal <chiragnagpal_12...@aitpune.edu.in>
Subject Re: DBSCAN implementation in Mahout
Date Sun, 30 Nov 2014 13:10:59 GMT
Hi Ted,

Thanks for the reply.

I have been using DBSCAN (in python), the one implemented in sci-kit package. For a dataset
with about 8k points, the running time on my Intel i7 4700 QM comes to around ~300 seconds.

I have implemented a parallel version using the multiprocessing python library, and the running
time comes down to about 100~120 seconds, when I 3 parallel threads. 

Thus the scale up is almost 'n'. I think scalability should not be an issue for a Map Reduce
implementation.

Chirag Nagpal
University of Pune, India
www.chiragnagpal.com
________________________________________
From: Ted Dunning <ted.dunning@gmail.com>
Sent: Sunday, November 30, 2014 6:29 PM
To: user@mahout.apache.org
Subject: Re: DBSCAN implementation in Mahout

On Sat, Nov 29, 2014 at 8:31 PM, 3316 Chirag Nagpal <
chiragnagpal_12102@aitpune.edu.in> wrote:

> Since Density based clustering algorithms, are being utilised extensively,
> especially by the GIS research groups, it is a bit sad that there isn't a
> Map Reduce implementation available..
>
> I think I will propose to write MapReduce code for DBSCAN and OPTICS for
> GSoC '15.
>
> I would like to take your input as to how much of significance would this
> be of to the community in general?
>

We have had proposals to add this to Mahout, but as far as I remember, no
credible requests to use it.

Also, there is the question of scalability of dbscan like algorithms.

Mime
View raw message