mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Wasilewski <wasi...@gmail.com>
Subject Re: Which text classification algo is best for the usecase?
Date Wed, 15 May 2013 10:23:14 GMT
If you're interested in solutions based on R/Matlab/Octave maybe you'll
find interesting this resources:
https://class.coursera.org/ml/lecture/preview - whole course about machine
learning and they're using mathematical tools to solve their problems,
http://www.stanford.edu/class/cs246/handouts.html - another very good
course about working with massive datasets

Have a nice day! :)

Jacek.


2013/5/15 Stuti Awasthi <stutiawasthi@hcl.com>

> Yes , there are scalability issues with R. Have you looked at RHadoop. I
> haven’t tried it but you can look at it if you have already worked with R
> and Hadoop.
>
> Thanks
> Stuti
>
> -----Original Message-----
> From: Chandra Mohan, Ananda Vel Murugan [mailto:
> Ananda.Murugan@honeywell.com]
> Sent: Wednesday, May 15, 2013 10:57 AM
> To: user@mahout.apache.org
> Subject: RE: Which text classification algo is best for the usecase?
>
> Hi,
>
> I used R for text classification. I tried SVM and Maximum entropy. They
> gave decent results. But when my dataset became huge, they were not
> scalable.
>
> Algorithms like Bagging, Boosting etc need lot of processing power. Most
> of time, my R code would fail with memory error.
>
> Regards,
> Anand.C
>
> -----Original Message-----
> From: Stuti Awasthi [mailto:stutiawasthi@hcl.com]
> Sent: Wednesday, May 15, 2013 10:53 AM
> To: user@mahout.apache.org
> Subject: RE: Which text classification algo is best for the usecase?
>
> Thanks Jacek,
>
> I will try to look at these algorithms also.. Thanks for the pointers :)
>
> Regards
> Stuti
>
> -----Original Message-----
> From: Jacek Wasilewski [mailto:wasilek@gmail.com]
> Sent: Wednesday, May 15, 2013 4:52 AM
> To: user@mahout.apache.org
> Subject: Re: Which text classification algo is best for the usecase?
>
> Dear Stuti,
>
> Thanks for those answers.
>
> As far as I know Naive Bayes handles pretty well with text classification
> - the most common example of Naive Bayes usage is a spam classification.
>
> I think you could also try with SVM (Support Vector Machines) and Boosting.
> Time ago I read some papers where the results of these algorithms in text
> classification were very good. Unfortunately I haven't had opportunity to
> implement such a problem using Mahout, so you have to try it youself or
> maybe some Mahout expert could say a word how to do this.
>
> I can only advice that you can check the results of this methods using g.e.
> Weka or Rapidminer before implementing that with Mahout.
>
> I hope I help a little bit and I'm sorry that I couldn't help (yet) with
> Mahout.
>
> Best wishes,
> Jacek Wasilewski.
>
>
> 2013/5/14 Stuti Awasthi <stutiawasthi@hcl.com>
>
> > Hey Jack,
> >
> > Thanks for response. Regarding your queries:
> >
> > 1. Classes in which il categorize will range from 3-4 in numbers. Eg
> > like Problem,Solution,Idea etc 2. The number of keywords or phrase can
> > vary. It is not fixed in number.
> > For now Il take around 100 keyword/phrases but later on this will grow.
> >
> > Thanks
> > Stuti Awasthi
> >
> > -----Original Message-----
> > From: Jacek Wasilewski [mailto:wasilek@gmail.com]
> > Sent: Tuesday, May 14, 2013 5:23 PM
> > To: user@mahout.apache.org
> > Subject: Re: Which text classification algo is best for the usecase?
> >
> > Hi,
> >
> > I'm a new here and maybe I'm not an expert in Mahout, but maybe I'll
> > be able to help you somehow.
> >
> > To understand better your problem I have few questions:
> > 1. Can you provide an example of classes that you'd like to learn? How
> > many classes are there?
> > 2. Do you know the total number of this "keywords/phrases" or is it
> > variant?
> >
> > Best wishes,
> > Jacek Wasilewski.
> >
> >
> > 2013/5/14 Stuti Awasthi <stutiawasthi@hcl.com>
> >
> > > Hi,
> > >
> > > I want to perform text classification using Mahout. For now I have
> > > tried with Naïve Bayes algorithm but I want your suggestion on which
> > > Algo will be better for my usecase.
> > >
> > > Usecase:
> > >
> > > I want to classify the text based on custom "keywords/phrases". So
> > > can I create vectors of the documents in which features are custom
> > > "keyword/phrases".  Basically assume that I have some bag of words
> > > and phrases based on them I want the classification.
> > >
> > > How can we implement such problem in mahout. Is there any already
> > > existing algorithm which I can use.
> > >
> > > Thanks
> > > Stuti Awasthi
> > >
> > >
> > >
> > > ::DISCLAIMER::
> > >
> > > --------------------------------------------------------------------
> > > --
> > > --------------------------------------------------------------------
> > > --
> > > --------
> > >
> > > The contents of this e-mail and any attachment(s) are confidential
> > > and intended for the named recipient(s) only.
> > > E-mail transmission is not guaranteed to be secure or error-free as
> > > information could be intercepted, corrupted, lost, destroyed, arrive
> > > late or incomplete, or may contain viruses in transmission. The e
> > > mail and its contents (with or without referred errors) shall
> > > therefore not attach any liability on the originator or HCL or its
> affiliates.
> > > Views or opinions, if any, presented in this email are solely those
> > > of the author and may not necessarily reflect the views or opinions
> > > of HCL or its affiliates. Any form of reproduction, dissemination,
> > > copying, disclosure, modification, distribution and / or publication
> > > of this message without the prior written consent of authorized
> > > representative of HCL is strictly prohibited. If you have received
> > > this email in error please delete it and notify the sender
> > > immediately.
> > > Before opening any email and/or attachments, please check them for
> > > viruses and other defects.
> > >
> > >
> > > --------------------------------------------------------------------
> > > --
> > >
> > ----------------------------------------------------------------------
> > --------
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message