mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: mahout exception (lucene.vector)
Date Fri, 09 Dec 2011 15:29:03 GMT
The Lucene Driver class has a clause in it that keeps track of how many docs don't have term
vectors and will then exit if the threshold is reached.  You can control the threshold using
the maxPercentErrorDocs input argument.  The argument is a percentage, expressed as number
between 0 and 1.  0 is the default.  If you think you have only a few that are missing, then
you can set a higher threshold, but in reality, it probably means you don't have term vectors
on in your index, as it typically is an all or nothing thing.


On Dec 7, 2011, at 6:25 PM, michzel wrote:

> hello, first thanks for Sean Owen answered my email so quickly. and pointed
> out my mistake. but when i ran the order as follows, a exception occured:
> bin/mahout lucene.vector --dir /home/michzel/index --output
> /home/michzel/part-out.vec --field contents --dictOut /home/michzel/dict.out
> --norm 2
> Running on hadoop, using HADOOP_HOME=/var/hadoop
> HADOOP_CONF_DIR=/var/hadoop/conf
> 11/12/08 08:52:21 WARN driver.MahoutDriver: No lucene.vector.props found on
> classpath, will use command-line arguments only
> 11/12/08 08:52:21 INFO lucene.Driver: Output File:
> /home/michzel/part-out.vec
> 11/12/08 08:52:21 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 11/12/08 08:52:21 INFO zlib.ZlibFactory: Successfully loaded & initialized
> native-zlib library
> 11/12/08 08:52:21 INFO compress.CodecPool: Got brand-new compressor
> 11/12/08 08:52:21 ERROR lucene.LuceneIterator: There are too many documents
> that do not have a term vector for contents
> Exception in thread "main" java.lang.IllegalStateException: There are too
> many documents that do not have a term vector for contents
> 	at
> org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:114)
> 	at
> org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:41)
> 	at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
> 	at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
> 	at
> org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
> 	at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:206)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:616)
> 	at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:616)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> 
> when  i change the "--field" to another filed "--filed filename" the program
> succeed. I wonder what had happened? please help me, thanks a lot.
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/mahout-exception-lucene-vector-tp3569144p3569144.html
> Sent from the Mahout User List mailing list archive at Nabble.com.

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message