mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chandra Mohan, Ananda Vel Murugan" <Ananda.Muru...@honeywell.com>
Subject RE: Which text classification algo is best for the usecase?
Date Wed, 15 May 2013 05:36:39 GMT
Sure. I will check this. 

Regards
Anand.C

-----Original Message-----
From: Stuti Awasthi [mailto:stutiawasthi@hcl.com] 
Sent: Wednesday, May 15, 2013 11:01 AM
To: user@mahout.apache.org
Subject: RE: Which text classification algo is best for the usecase?

Yes , there are scalability issues with R. Have you looked at RHadoop. I haven’t tried it
but you can look at it if you have already worked with R and Hadoop.

Thanks
Stuti

-----Original Message-----
From: Chandra Mohan, Ananda Vel Murugan [mailto:Ananda.Murugan@honeywell.com] 
Sent: Wednesday, May 15, 2013 10:57 AM
To: user@mahout.apache.org
Subject: RE: Which text classification algo is best for the usecase?

Hi, 

I used R for text classification. I tried SVM and Maximum entropy. They gave decent results.
But when my dataset became huge, they were not scalable. 

Algorithms like Bagging, Boosting etc need lot of processing power. Most of time, my R code
would fail with memory error. 

Regards,
Anand.C

-----Original Message-----
From: Stuti Awasthi [mailto:stutiawasthi@hcl.com]
Sent: Wednesday, May 15, 2013 10:53 AM
To: user@mahout.apache.org
Subject: RE: Which text classification algo is best for the usecase?

Thanks Jacek,

I will try to look at these algorithms also.. Thanks for the pointers :)

Regards
Stuti

-----Original Message-----
From: Jacek Wasilewski [mailto:wasilek@gmail.com]
Sent: Wednesday, May 15, 2013 4:52 AM
To: user@mahout.apache.org
Subject: Re: Which text classification algo is best for the usecase?

Dear Stuti,

Thanks for those answers.

As far as I know Naive Bayes handles pretty well with text classification - the most common
example of Naive Bayes usage is a spam classification.

I think you could also try with SVM (Support Vector Machines) and Boosting.
Time ago I read some papers where the results of these algorithms in text classification were
very good. Unfortunately I haven't had opportunity to implement such a problem using Mahout,
so you have to try it youself or maybe some Mahout expert could say a word how to do this.

I can only advice that you can check the results of this methods using g.e.
Weka or Rapidminer before implementing that with Mahout.

I hope I help a little bit and I'm sorry that I couldn't help (yet) with Mahout.

Best wishes,
Jacek Wasilewski.


2013/5/14 Stuti Awasthi <stutiawasthi@hcl.com>

> Hey Jack,
>
> Thanks for response. Regarding your queries:
>
> 1. Classes in which il categorize will range from 3-4 in numbers. Eg 
> like Problem,Solution,Idea etc 2. The number of keywords or phrase can 
> vary. It is not fixed in number.
> For now Il take around 100 keyword/phrases but later on this will grow.
>
> Thanks
> Stuti Awasthi
>
> -----Original Message-----
> From: Jacek Wasilewski [mailto:wasilek@gmail.com]
> Sent: Tuesday, May 14, 2013 5:23 PM
> To: user@mahout.apache.org
> Subject: Re: Which text classification algo is best for the usecase?
>
> Hi,
>
> I'm a new here and maybe I'm not an expert in Mahout, but maybe I'll 
> be able to help you somehow.
>
> To understand better your problem I have few questions:
> 1. Can you provide an example of classes that you'd like to learn? How 
> many classes are there?
> 2. Do you know the total number of this "keywords/phrases" or is it 
> variant?
>
> Best wishes,
> Jacek Wasilewski.
>
>
> 2013/5/14 Stuti Awasthi <stutiawasthi@hcl.com>
>
> > Hi,
> >
> > I want to perform text classification using Mahout. For now I have 
> > tried with Naïve Bayes algorithm but I want your suggestion on which 
> > Algo will be better for my usecase.
> >
> > Usecase:
> >
> > I want to classify the text based on custom "keywords/phrases". So 
> > can I create vectors of the documents in which features are custom 
> > "keyword/phrases".  Basically assume that I have some bag of words 
> > and phrases based on them I want the classification.
> >
> > How can we implement such problem in mahout. Is there any already 
> > existing algorithm which I can use.
> >
> > Thanks
> > Stuti Awasthi
> >
> >
> >
> > ::DISCLAIMER::
> >
> > --------------------------------------------------------------------
> > --
> > --------------------------------------------------------------------
> > --
> > --------
> >
> > The contents of this e-mail and any attachment(s) are confidential 
> > and intended for the named recipient(s) only.
> > E-mail transmission is not guaranteed to be secure or error-free as 
> > information could be intercepted, corrupted, lost, destroyed, arrive 
> > late or incomplete, or may contain viruses in transmission. The e 
> > mail and its contents (with or without referred errors) shall 
> > therefore not attach any liability on the originator or HCL or its affiliates.
> > Views or opinions, if any, presented in this email are solely those 
> > of the author and may not necessarily reflect the views or opinions 
> > of HCL or its affiliates. Any form of reproduction, dissemination, 
> > copying, disclosure, modification, distribution and / or publication 
> > of this message without the prior written consent of authorized 
> > representative of HCL is strictly prohibited. If you have received 
> > this email in error please delete it and notify the sender 
> > immediately.
> > Before opening any email and/or attachments, please check them for 
> > viruses and other defects.
> >
> >
> > --------------------------------------------------------------------
> > --
> >
> ----------------------------------------------------------------------
> --------
> >
>
Mime
View raw message