mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Creating Vectors from Text
Date Fri, 03 Jul 2009 12:13:21 GMT

On Jul 2, 2009, at 12:09 PM, Allan Roberto Avendano Sudario wrote:

> Regards,
> This is the entire exception message:
>
>
> java -cp $JAVACLASSPATH org.apache.mahout.utils.vectors.Driver --dir
> /home/hadoop/Desktop/<urls>/index  --field content  --dictOut
> /home/hadoop/Desktop/dictionary/dict.txt --output
> /home/hadoop/Desktop/dictionary/out.txt --max 50 --norm 2
>
>
> 09/07/02 09:35:47 INFO vectors.Driver: Output File:
> /home/hadoop/Desktop/dictionary/out.txt
> 09/07/02 09:35:47 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 09/07/02 09:35:47 INFO zlib.ZlibFactory: Successfully loaded &  
> initialized
> native-zlib library
> 09/07/02 09:35:47 INFO compress.CodecPool: Got brand-new compressor
> Exception in thread "main" java.lang.NullPointerException
>        at
> org.apache.mahout.utils.vectors.lucene.LuceneIteratable 
> $TDIterator.next(LuceneIteratable.java:111)
>        at
> org.apache.mahout.utils.vectors.lucene.LuceneIteratable 
> $TDIterator.next(LuceneIteratable.java:82)
>        at
> org 
> .apache 
> .mahout 
> .utils 
> .vectors 
> .io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:25)
>        at org.apache.mahout.utils.vectors.Driver.main(Driver.java:204)
>
>
> Well, I used a nutch crawl index, is that correct? mmm... I have  
> change to
> contenc field, but nothing happened.
> Possibly the nutch crawl doesn´t have Term Vector indexed.

This would be my guess.  A small edit to Nutch code would probably  
allow it.  Just find where it creates a new Field and add in the TV  
stuff.
Mime
View raw message