mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neil Ghosh <neil.gh...@gmail.com>
Subject Re: Text Classification using Mahout
Date Tue, 28 Sep 2010 17:49:56 GMT
Hi Grant ,

I am trying to run the classification example in

http://www.ibm.com/developerworks/java/library/j-mahout/

doing the step 3. ant install

However it is trying to download the 2GB file , I might run out of space in
my linux partition , also download may be disturbed in my connection .

is there any way I can test the example in a smaller set of wikipedia data
or download the data offline ?

Thanks
Neil
http://neilghosh.com

On Mon, Sep 27, 2010 at 6:12 PM, Grant Ingersoll <gsingers@apache.org>wrote:

>
> On Sep 24, 2010, at 1:12 PM, Neil Ghosh wrote:
>
> > Is there any other examples/documents/reference how to use mahout for*
> text
> > classification.
> > *
> > I went through and ran the following
> >
> >
> >   1. Wikipedia Bayes
> > Example<https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html>-
> > Classify Wikipedia data.
> >
> >
> >   1. Twenty Newsgroups<
> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html>-
> > Classify the classic Twenty Newsgroups data.
> >
> > However these two are not much definitive and there aren't much
> explanation
> > for the examples .Please share if there are more documentation.
>
>
> What kinds of problems are you looking to solve?  In general, we don't have
> too much in the way of special things for text other than we have various
> utilities for converting text into Mahout's vector format based on various
> weighting schemes.  Both of those examples just take and convert the text
> into vectors and then either train or test on them.  I would agree, though,
> that a good tutorial is needed.  It's a bit out of date in terms of the
> actual commands, but I believe the concepts are still accurate:
> http://www.ibm.com/developerworks/java/library/j-mahout/
>
> See
> https://cwiki.apache.org/confluence/display/MAHOUT/Mahout+Wiki#MahoutWiki-ImplementationBackground(and
the creating vectors section).  Also see the Algorithms section.
>
>
> --------------------------
> Grant Ingersoll
> http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8
>
>


-- 
Thanks and Regards
Neil
http://neilghosh.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message