mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philippe Adjiman <adji...@gmail.com>
Subject issue while running lucene.vector driver in mahout 0.5
Date Sun, 18 Sep 2011 14:48:24 GMT
Hi,

I was trying to generate vectors from a lucene index using the lucene.vector
driver, it worked fine using mahout 0.4 but in mahout 0.5 i get the
following exception:

SEVERE: There are too many documents that do not have a term vector for
description
Exception in thread "main" java.lang.IllegalStateException: There are too
many documents that do not have a term vector for description
 at
org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:114)
at
org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:41)
 at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
 at
org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:206)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
 at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
 at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)

My lucene index was created using:


doc.add(new Field("documentId", documentId, Field.Store.YES,
Field.Index.NOT_ANALYZED));
doc.add(new Field("content", content, Field.Store.YES,
Field.Index.ANALYZED,TermVector.YES));


If it is a know issue, sorry for the duplicate, else let me know if i can
help in order to reproduce.


-Philippe


-- 
Philippe Adjiman | twitter: padjiman | linkedin:
il.linkedin.com/in/philippeadjiman | blog: http://philippeadjiman.com/blog

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message