lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aditya Gollakota" <aditya.gollak...@customware.net>
Subject Using Lucene to index Meta-data from txt, html, PDF etc files.
Date Thu, 14 Sep 2006 08:50:11 GMT
Hi Guys,

 

Just wondering how you would go about indexing meta-data from files. I've
used the demo package IndexHTMLjava and have updated the HTMLDocument.java
with the following:

 

DataInput input = new DataInputStream(new BufferedInputStream(new
FileInputStream(f)));

Content content = Content.read(input);

Reader contentReader = new ArrayFile.Reader(new LocalFileSystem(null),new
File(f.getPath(), Content.DIR_NAME).toString(), null);

    

System.out.println(content);

ParseData parseData = ParseData.read(input);

Metadata metadata = parseData.getContentMeta();

 

doc.add(new Field("keywords", metadata.KEYWORDS, Field.Store.YES,
Field.Index.NO));

 

I'm using the nutch-0.8.jar for the Metadata Class and have used the jars of
nutch to resolve any exceptions and also Lucene-2.0.0

 

While compiling this code, I'm getting the following error:

 

A record version mismatch occurred. Expecting v1, found v118.

 

Any help would be much appreciated.

 

Regards,

 

Aditya Gollakota
Support Engineer | CustomWare Asia Pacific | www.customware.net
T: +61 2 9900 5742 | F: +61 2 9475 0100 | M: +61 405 033 951
E: aditya.gollakota@customware.net

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message