mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankit Goel <ankitgoel2...@gmail.com>
Subject Re: Mahout clustering from lucene index
Date Sat, 10 Oct 2015 16:08:29 GMT
Hi,
In kmeans we need to specify number of clusters and directory of initial vectors. When you
want random initial vectors, specify k (-k 5) and directory for initial vectors -or in this
case where they will be saved. This is specified by -c ./cluster-directory/initial (thats
my preference). You can obviously specify any location.

> On 10-Oct-2015, at 7:47 pm, Cristian Barrientos Montoya <cs3kns@gmail.com> wrote:
> 
> Hi there,
> I've been trying to run kmeans clustering on a lucene index, after creating
> the vectors with the command tool "lucene.vector", but the kmeans algorithm
> also needs a clusters input "-c", but I don't know where or how get these,
> would you give me some advice or another way to to the kmeans clustering ?
> 
> My case scenario is:
> A lot of resources gotten from apache nutch, the resources are on apache
> solr (v 5.2), so I exported on a json file to create an index on lucene (v
> 4.6), the resources are something like:
> 
> {
> "title": "Title #1",
> "summary": "summary of the resource",
> "url": "www.urlresources.com/resourceId.jpg",
> "description": "Some description",
> "extension": "jpg",
> "subject": "Subject of the resource",
> "area": "resource area"
> }
> 
> This is how I am indexing to lucene:
> https://gist.github.com/ColadaFF/1d6557ebaa147753bc9f
> 
> And the way I am generating vectors is the same as the example on the
> mahout page:
> https://mahout.apache.org/users/basics/creating-vectors-from-text.html
> 
> Am I in the right direction or should I use classification?
> 
> I'm also reading some resources, but all of them don't say what to do with
> the lucene vectors, so, also any resource you can give will be pretty great.
> 
> Thanks all of you!


Mime
View raw message