mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cristian Barrientos Montoya <cs3...@gmail.com>
Subject Mahout clustering from lucene index
Date Sat, 10 Oct 2015 14:17:04 GMT
Hi there,
I've been trying to run kmeans clustering on a lucene index, after creating
the vectors with the command tool "lucene.vector", but the kmeans algorithm
also needs a clusters input "-c", but I don't know where or how get these,
would you give me some advice or another way to to the kmeans clustering ?

My case scenario is:
A lot of resources gotten from apache nutch, the resources are on apache
solr (v 5.2), so I exported on a json file to create an index on lucene (v
4.6), the resources are something like:

{
"title": "Title #1",
"summary": "summary of the resource",
"url": "www.urlresources.com/resourceId.jpg",
"description": "Some description",
"extension": "jpg",
"subject": "Subject of the resource",
"area": "resource area"
}

This is how I am indexing to lucene:
https://gist.github.com/ColadaFF/1d6557ebaa147753bc9f

And the way I am generating vectors is the same as the example on the
mahout page:
https://mahout.apache.org/users/basics/creating-vectors-from-text.html

Am I in the right direction or should I use classification?

I'm also reading some resources, but all of them don't say what to do with
the lucene vectors, so, also any resource you can give will be pretty great.

Thanks all of you!

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message