mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuti Awasthi <>
Subject RE: Which text classification algo is best for the usecase?
Date Tue, 14 May 2013 12:17:56 GMT
Hey Jack,

Thanks for response. Regarding your queries:

1. Classes in which il categorize will range from 3-4 in numbers. Eg like Problem,Solution,Idea
2. The number of keywords or phrase can vary. It is not fixed in number. For now Il take around
100 keyword/phrases but later on this will grow.

Stuti Awasthi

-----Original Message-----
From: Jacek Wasilewski [] 
Sent: Tuesday, May 14, 2013 5:23 PM
Subject: Re: Which text classification algo is best for the usecase?


I'm a new here and maybe I'm not an expert in Mahout, but maybe I'll be able to help you somehow.

To understand better your problem I have few questions:
1. Can you provide an example of classes that you'd like to learn? How many classes are there?
2. Do you know the total number of this "keywords/phrases" or is it variant?

Best wishes,
Jacek Wasilewski.

2013/5/14 Stuti Awasthi <>

> Hi,
> I want to perform text classification using Mahout. For now I have 
> tried with Naïve Bayes algorithm but I want your suggestion on which 
> Algo will be better for my usecase.
> Usecase:
> I want to classify the text based on custom "keywords/phrases". So can 
> I create vectors of the documents in which features are custom 
> "keyword/phrases".  Basically assume that I have some bag of words and 
> phrases based on them I want the classification.
> How can we implement such problem in mahout. Is there any already 
> existing algorithm which I can use.
> Thanks
> Stuti Awasthi
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> --------
> The contents of this e-mail and any attachment(s) are confidential and 
> intended for the named recipient(s) only.
> E-mail transmission is not guaranteed to be secure or error-free as 
> information could be intercepted, corrupted, lost, destroyed, arrive 
> late or incomplete, or may contain viruses in transmission. The e mail 
> and its contents (with or without referred errors) shall therefore not 
> attach any liability on the originator or HCL or its affiliates.
> Views or opinions, if any, presented in this email are solely those of 
> the author and may not necessarily reflect the views or opinions of 
> HCL or its affiliates. Any form of reproduction, dissemination, 
> copying, disclosure, modification, distribution and / or publication 
> of this message without the prior written consent of authorized 
> representative of HCL is strictly prohibited. If you have received 
> this email in error please delete it and notify the sender 
> immediately.
> Before opening any email and/or attachments, please check them for 
> viruses and other defects.
> ----------------------------------------------------------------------
> ------------------------------------------------------------------------------
View raw message