commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject Re: [Math] Contributions to the clustering module (maybe GSoC)
Date Sun, 08 Feb 2015 21:12:36 GMT
On 2/7/15 1:53 PM, Alina Ciobanu wrote:
> Hello,
>
> I finally figured out my schedule for this summer and the conclusion is that I would
be able to dedicate about 20 hours per week for the GSoC project. As far as I understand,
this is about half of what is expected from a GSoC student, so unfortunately I think I should
not apply this year. I want to contribute to the Commons Math library nonetheless.

Patches / review / ideas are always welcome!

Phil
>
> Best regards,
> Alina
>       From: Thomas Neidhart <thomas.neidhart@gmail.com>
>  To: Commons Developers List <dev@commons.apache.org> 
>  Sent: Tuesday, February 3, 2015 1:17 AM
>  Subject: Re: [Math] Contributions to the clustering module (maybe GSoC)
>    
> On 02/02/2015 10:36 PM, Alina Ciobanu wrote:
>> Hello Thomas,
>>
>>
>> Thank you for the answer. I hope I will be able to clarify my schedule for the summer
in about a week from now and I will decide whether I should apply to GSoC this year or not.
I will let you know as soon as I can. Until then, I will shortly describe my first ideas below:
>>
>>
>> 1. Spectral clustering [1] - It basically maps the data in a lower-dimensional space
(relying on the eigenvectors of the similarity matrix) and performs (k-means) clustering there.
This method can resolve a wide variety of problems, regardless of the form of the clusters.
It could be implemented efficiently using the Commons Math linear algebra module.
>>
>>
>> 2. Mean shift algorithm [2] - I didn't grasp all the details of the algorithm yet,
but I find it very interesting. As far as I understand, it has been primarily used in pattern
recognition and computer vision. I discovered it while searching for an algorithm that does
not require the number of clusters as input parameter. I think it would be a good addition
to Commons Math besides DBSCAN, from this point of view.
>>
>>
>> 3. Clustering evaluation methods3.1. The Silhouette Coefficient [3] - accounts for
the intra-cluster and inter-cluster distance to assign a score in [-1, 1] to a clustering.3.2.
External clustering evaluation [4] - when gold standard is available for the clustered data,
it can be used to asses the performance of a clustering algorithm.
>>
>>
>> Suggestions are more than welcome. If you have requests from users for specific clustering
algorithms, please let me know.
> You proposals sound good, as a pointer to already existing feature
> requests you can take a look at:
>
>  * Optics algorithm - https://issues.apache.org/jira/browse/MATH-1190
>  * HAC algorithm - https://issues.apache.org/jira/browse/MATH-959
>
> Cluster evaluation would also be very interesting, I already wanted to
> do something in this direction but could not find the time.
>
> btw. by coincidence, we received a reminder about this years GSOC just
> today, the deadline is 13-02-2015 to submit a project proposal with
> project ideas.
>
>
>
> Thomas
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>
>
>    


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message