mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jayghost (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MAHOUT-1034) ERROR in Navie Bayes Training(trainnb)
Date Mon, 09 Jul 2012 15:23:35 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409560#comment-13409560
] 

jayghost edited comment on MAHOUT-1034 at 7/9/12 3:22 PM:
----------------------------------------------------------

I try to use -D numLabels=500 as Generic Options, but it shows another error.

{hadoop@master:~/program/mahout-distribution-0.7$ bin/mahout trainnb -D numLabels=5000 -i
~/Downloads/20news-bydate/20news-bydate-train-vectors/tfidf-vectors -o ~/Downloads/20news-bydate/model/
-el -li ~/Downloads/20news-bydate/labelindex -owMAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR
to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /home/hadoop/program/hadoop-1.0.1/bin/hadoop and HADOOP_CONF_DIR=/home/hadoop/program/hadoop-1.0.1/conf
MAHOUT-JOB: /home/hadoop/program/mahout-distribution-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated.

12/07/09 23:18:27 WARN driver.MahoutDriver: No trainnb.props found on classpath, will use
command-line arguments only
12/07/09 23:18:27 ERROR common.AbstractJob: Unexpected /home/hadoop/Downloads/20news-bydate/model/
while processing Job-Specific Options:
usage: <command> [Generic Options] [Job-Specific Options]
Generic Options:
 -archives <paths>              comma separated archives to be unarchived
                                on the compute machines.
 -conf <configuration file>     specify an application configuration file
 -D <property=value>            use value for given property
 -files <paths>                 comma separated files to be copied to the
                                map reduce cluster
 -fs <local|namenode:port>      specify a namenode
 -jt <local|jobtracker:port>    specify a job tracker
 -libjars <paths>               comma separated jar files to include in
                                the classpath.
 -tokenCacheFile <tokensFile>   name of the file with the tokens
Unexpected /home/hadoop/Downloads/20news-bydate/model/ while processing         
Job-Specific Options:                                                           
Usage:                                                                          
 [--input <input> --output <output> --labels <labels> --extractLabels --alphaI
 
<alphaI> --trainComplementary --labelIndex <labelIndex> --overwrite --help   
  
--tempDir <tempDir> --startPhase <startPhase> --endPhase <endPhase>]   
        
Job-Specific Options:                                                           
  --input (-i) input               Path to job input directory.                 
  --output (-o) output             The directory pathname for output.           
  --labels (-l) labels             comma-separated list of labels to include in 
                                   training                                     
  --extractLabels (-el)            Extract the labels from the input            
  --alphaI (-a) alphaI             smoothing parameter                          
  --trainComplementary (-c)        train complementary?                         
  --labelIndex (-li) labelIndex    The path to store the label index in         
  --overwrite (-ow)                If present, overwrite the output directory   
                                   before running job                           
  --help (-h)                      Print out help                               
  --tempDir tempDir                Intermediate output directory                
  --startPhase startPhase          First phase to run                           
  --endPhase endPhase              Last phase to run                            
12/07/09 23:18:27 INFO driver.MahoutDriver: Program took 436 ms (Minutes: 0.007266666666666667)}

How can I add the numLabels optition? Help pls!!! Thanks!
                
      was (Author: jayghost):
    I try to use -D numLabels=500 as Generic Options, but it shows another error.

{hadoop@master:~/program/mahout-distribution-0.7$ bin/mahout trainnb -D numLabels=5000 -i
~/Downloads/20news-bydate/20news-bydate-train-vectors/tfidf-vectors -o ~/Downloads/20news-bydate/model
-el -li ~/Downloads/20news-bydate/labelindex -owMAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR
to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /home/hadoop/program/hadoop-1.0.1/bin/hadoop and HADOOP_CONF_DIR=/home/hadoop/program/hadoop-1.0.1/conf
MAHOUT-JOB: /home/hadoop/program/mahout-distribution-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated.

12/07/09 23:18:27 WARN driver.MahoutDriver: No trainnb.props found on classpath, will use
command-line arguments only
12/07/09 23:18:27 ERROR common.AbstractJob: Unexpected /home/hadoop/Downloads/20news-bydate/model
while processing Job-Specific Options:
usage: <command> [Generic Options] [Job-Specific Options]
Generic Options:
 -archives <paths>              comma separated archives to be unarchived
                                on the compute machines.
 -conf <configuration file>     specify an application configuration file
 -D <property=value>            use value for given property
 -files <paths>                 comma separated files to be copied to the
                                map reduce cluster
 -fs <local|namenode:port>      specify a namenode
 -jt <local|jobtracker:port>    specify a job tracker
 -libjars <paths>               comma separated jar files to include in
                                the classpath.
 -tokenCacheFile <tokensFile>   name of the file with the tokens
Unexpected /home/hadoop/Downloads/20news-bydate/model/ while processing         
Job-Specific Options:                                                           
Usage:                                                                          
 [--input <input> --output <output> --labels <labels> --extractLabels --alphaI
 
<alphaI> --trainComplementary --labelIndex <labelIndex> --overwrite --help   
  
--tempDir <tempDir> --startPhase <startPhase> --endPhase <endPhase>]   
        
Job-Specific Options:                                                           
  --input (-i) input               Path to job input directory.                 
  --output (-o) output             The directory pathname for output.           
  --labels (-l) labels             comma-separated list of labels to include in 
                                   training                                     
  --extractLabels (-el)            Extract the labels from the input            
  --alphaI (-a) alphaI             smoothing parameter                          
  --trainComplementary (-c)        train complementary?                         
  --labelIndex (-li) labelIndex    The path to store the label index in         
  --overwrite (-ow)                If present, overwrite the output directory   
                                   before running job                           
  --help (-h)                      Print out help                               
  --tempDir tempDir                Intermediate output directory                
  --startPhase startPhase          First phase to run                           
  --endPhase endPhase              Last phase to run                            
12/07/09 23:18:27 INFO driver.MahoutDriver: Program took 436 ms (Minutes: 0.007266666666666667)
}

How can I add the numLabels optition? Help pls!!! Thanks!
                  
> ERROR in Navie Bayes Training(trainnb)
> --------------------------------------
>
>                 Key: MAHOUT-1034
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1034
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.7
>         Environment: Ubuntu 11.04
>            Reporter: Leting Wu
>            Priority: Critical
>
> When run either examples/classify-20newsgrouops.sh or ash-email-examples.sh, trainnb
always fails:
> {noformat}
> INFO mapred.JobClient: Task Id : attempt_201206281546_0003_m_000000_0, Status : FAILED
> java.lang.IllegalArgumentException
> 	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
> 	at org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:264)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message