mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From conflue...@apache.org
Subject [CONF] Apache Lucene Mahout: TwentyNewsgroups (page edited)
Date Fri, 07 Nov 2008 17:39:00 GMT
TwentyNewsgroups (MAHOUT) edited by Grant Ingersoll
      Page: http://cwiki.apache.org/confluence/display/MAHOUT/TwentyNewsgroups
   Changes: http://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=99739&originalVersion=4&revisedVersion=5






Content:
---------------------------------------------------------------------

h1. Twenty Newsgroups Classification

[Get Mahout|http://cwiki.apache.org/confluence/display/MAHOUT/index#index-Installation%2FSetup]


Assume MAHOUT_HOME refers to the location where you checked out/installed Mahout

After downloading the distribution, unzip/untar it into the directory of your choice and do:

h2. Setup:

# cd examples
# ant get-files  
# ant extract-20news-18828 
# ant job

Then, from Hadoop:

# emacs conf/hadoop-site.xml (add in local settings per [quickstart|http://hadoop.apache.org/core/docs/current/quickstart.html])
# bin/hadoop namenode -format  //Format the HDFS
# bin/start-all.sh  //Start Hadoop
# bin/hadoop dfs -put <MAHOUT_HOME>/work/20news-18828-collapse 20newsInput 
//Copies the extracted text to HDFS

h2. Bayes
Then, to train the Bayes Classifier using tri-grams:
{code}hadoop jar <MAHOUT_HOME>/examples/build/apache-mahout-examples-0.1-dev.jar org.apache.mahout.classifier.bayes.TrainClassifier
-t -i 20newsInput -o newsmodel -ng 3 -type bayes{code}

To Test:
{code}hadoop jar <MAHOUT_HOME>/examples/build/apache-mahout-examples-0.1-dev.jar org.apache.mahout.classifier.bayes.TestClassifier
-p newsmodel -t work/newstest -ng 3 -type bayes{code}


h2. Complementary Naive Bayes

To Train a CBayes Classifier using bi-grams
{code}hadoop jar <MAHOUT_HOME>/examples/build/apache-mahout-examples-0.1-dev.jar org.apache.mahout.classifier.bayes.TrainClassifier
-t -i 20newsInput -o newsmodel -ng 2 -type cbayes{code}

To Test a CBayes Classifier using bi-grams
{code}hadoop jar <MAHOUT_HOME>/examples/build/apache-mahout-examples-0.1-dev.jar org.apache.mahout.classifier.bayes.TestClassifier
-p newsmodel -t work/newstest -ng 2 -type cbayes{code}




---------------------------------------------------------------------
CONFLUENCE INFORMATION
This message is automatically generated by Confluence

Unsubscribe or edit your notifications preferences
   http://cwiki.apache.org/confluence/users/viewnotifications.action

If you think it was sent incorrectly contact one of the administrators
   http://cwiki.apache.org/confluence/administrators.action

If you want more information on Confluence, or have a bug to report see
   http://www.atlassian.com/software/confluence



Mime
View raw message