mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Spalas <>
Subject Help for Grouping similar items together. Clustering/Classification problem?
Date Mon, 21 Jul 2014 19:43:29 GMT

these days I am exploring Mahout Framework in order to solve a specific
The problem is that I have a csv file with 1.5 Million items - products
with the following format:
id, product_title
1, Apple IPHONE 5
2, Samsung Galaxy S5

and I would like to group the items-products together in terms of category
so for example in the above case both products would be under "Technology"
or "Smartphones" Category.

I would like to know if this is possible to handle in Mahout and whether
someone would choose clustering or classification way in order to solve
such a problem.

As, I am studying "Mahout in action" currently I saw that for Clustering
case I have to transform my data into a SequenceFile and find a way of
vectorization and I don't really get if this is applicable to my case at
the moment. For, the second case of classification I understand that I have
to provide some training data with target variable(in my case "Category")
in order to create a model for the classification system and I can extend
my dataset with this extra info but is it going to work?

Can anyone give me some advice on how to handle this particular problem?Is
it even possible to do it in Mahout? Any direction would be aprreciated!

Thanks alot in advance.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message