mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashikant Kore <shashik...@gmail.com>
Subject Re: CorruptIndexException or NullPointerException when creating vectors from Lucene
Date Fri, 15 Jan 2010 06:01:58 GMT
The first problem seems to be index version incompatibility.

Since you created index with Lucene 3.0, you will need the same
version to read the index. It seem while creating the vectors, the
version of Lucene is lower than that.  Can you check if you are using
the same lucene jar while creating vector?

Not sure what the second problem is.

--shashi

On Fri, Jan 15, 2010 at 11:11 AM, Rob Ennals <rob.ennals@gmail.com> wrote:
> Hi Guys,
>
> I'm totally new to Mahout so I'm running into what I expect are newbie issues.
>
> To get started with clustering, I tried importing some indexes from Lucene.
>
> Following the Lucene tutorial, I created a really simple index of the
> Lucene source code:
> http://lucene.apache.org/java/3_0_0/demo.html
>
> I then tried to convert this to a Mahout Vector, following as per
> http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html
>
> This gives me a CorruptIndexException:
>
> rob@rob:~/svn/mahout$ java
> org.apache.mahout.utils.vectors.lucene.Driver --dir
> /home/rob/Reference/Installers/lucene-3.0.0/index --output
> /home/rob/test/output --dictOut /home/rob/test/dict --max 50 --field
> contents
> Exception in thread "main"
> org.apache.lucene.index.CorruptIndexException: Incompatible format
> version: 2 expected 1 or lower
>        at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:117)
>        at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599)
>        at org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:104)
>        at org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:27)
>        at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74)
>        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704)
>        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:314)
>        at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:140)
>
>
> I also tried running the driver on the actual Lucene index that I want
> to apply it to, and this time to a NullPointerException:
>
> rob@rob:~/svn/mahout$ java
> org.apache.mahout.utils.vectors.lucene.Driver --dir
> /home/rob/git/thinklink/scala/bin/index/ --output
> /home/rob/test/output --dictOut /home/rob/test/dict --max 50 --field
> contents
> Jan 14, 2010 9:40:40 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Output File: /home/rob/test/output
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
>        at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:910)
>        at org.apache.hadoop.io.SequenceFile$RecordCompressWriter.<init>(SequenceFile.java:1074)
>        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:397)
>        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:284)
>        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:265)
>        at org.apache.mahout.utils.vectors.lucene.Driver.getSeqFileWriter(Driver.java:226)
>        at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:197)
>
>
> In both cases, the indexes should have the "contents" field.
>
>
> I assume I'm doing something stupid here. If someone can tell me what
> that is, then that would be great.
>
>
> Thanks
>
> -Rob
>

Mime
View raw message