Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (hermes.apache.org: local policy)
From: "Aditya Gollakota" <aditya.gollakota@customware.net>
To: <java-user@lucene.apache.org>
Subject: Using Lucene to index Meta-data from txt, html, PDF etc files.
Date: Thu, 14 Sep 2006 18:50:11 +1000
Message-ID: <05d001c6d7da$c9e04b80$12023c0a@mousse>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_05D1_01C6D82E.9B8C5B80"
Thread-Index: AcbX2sl551DyQ/JzShS/vAuyaw3Sbw==

------=_NextPart_000_05D1_01C6D82E.9B8C5B80
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hi Guys,

 
Just wondering how you would go about indexing meta-data from files. I've
used the demo package IndexHTMLjava and have updated the HTMLDocument.java
with the following:

 
DataInput input = new DataInputStream(new BufferedInputStream(new
FileInputStream(f)));

Content content = Content.read(input);

Reader contentReader = new ArrayFile.Reader(new LocalFileSystem(null),new
File(f.getPath(), Content.DIR_NAME).toString(), null);

    
System.out.println(content);

ParseData parseData = ParseData.read(input);

Metadata metadata = parseData.getContentMeta();

 
doc.add(new Field("keywords", metadata.KEYWORDS, Field.Store.YES,
Field.Index.NO));

 
I'm using the nutch-0.8.jar for the Metadata Class and have used the jars of
nutch to resolve any exceptions and also Lucene-2.0.0

 
While compiling this code, I'm getting the following error:

 
A record version mismatch occurred. Expecting v1, found v118.

 
Any help would be much appreciated.

 
Regards,

 
Aditya Gollakota
Support Engineer | CustomWare Asia Pacific | www.customware.net
T: +61 2 9900 5742 | F: +61 2 9475 0100 | M: +61 405 033 951
E: aditya.gollakota@customware.net

 
------=_NextPart_000_05D1_01C6D82E.9B8C5B80--