mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lokendra Singh <>
Subject Indexing the clustered data (lucene?)
Date Thu, 24 Feb 2011 17:04:01 GMT
Hi all,

I was looking for some ways to analyze/index the result ('clusteredPoints')
of k-means clustering.
My input testdata for clustering are the feature vectors extracted from a
dataset of images.

Providing an analogy of my problem to Lucene text indexing: In an Image, if
'n'  feature vectors (image-points)) fall into one cluster, they are
considered similar and it can be considered as a same 'word/term' appearing
'n' times in a text document. At the end, I want to generate TF-IDF vectors
for each image.

Is Lucene well-suited for such a purpose? My idea is to create 'Document'
object for each image with its field "content" as the ClusterID's of the
clusters contained in it, although I am not sure if it is a good approach
since the "content" of the Document Objects (Cluster ID's) will have to be
updated continuously while iterating through the Sequence file contained
inside 'clusteredPoints' directory.
What are the other tools or ways which are generally used by users to
analyze/index a non-textual clustered data ?

PS: Till now. I had been using the native data structures to analyse the
result of clustering but looking for some scalable tools to handle it


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message