Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 25909 invoked from network); 14 Sep 2006 08:52:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 14 Sep 2006 08:52:37 -0000 Received: (qmail 68821 invoked by uid 500); 14 Sep 2006 08:52:31 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 68669 invoked by uid 500); 14 Sep 2006 08:52:31 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 68642 invoked by uid 99); 14 Sep 2006 08:52:31 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [203.16.237.3] (HELO pie.customware.net) (203.16.237.3) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Sep 2006 01:52:29 -0700 Received: from mousse ([203.16.237.6]) by pie.customware.net (using TLSv1/SSLv3 with cipher RC4-MD5 (128 bits)) for java-user@lucene.apache.org; Thu, 14 Sep 2006 18:53:47 +1000 From: "Aditya Gollakota" To: Subject: Using Lucene to index Meta-data from txt, html, PDF etc files. Date: Thu, 14 Sep 2006 18:50:11 +1000 Message-ID: <05d001c6d7da$c9e04b80$12023c0a@mousse> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_05D1_01C6D82E.9B8C5B80" X-Mailer: Microsoft Office Outlook 11 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2962 Thread-Index: AcbX2sl551DyQ/JzShS/vAuyaw3Sbw== X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_NextPart_000_05D1_01C6D82E.9B8C5B80 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi Guys, Just wondering how you would go about indexing meta-data from files. I've used the demo package IndexHTMLjava and have updated the HTMLDocument.java with the following: DataInput input = new DataInputStream(new BufferedInputStream(new FileInputStream(f))); Content content = Content.read(input); Reader contentReader = new ArrayFile.Reader(new LocalFileSystem(null),new File(f.getPath(), Content.DIR_NAME).toString(), null); System.out.println(content); ParseData parseData = ParseData.read(input); Metadata metadata = parseData.getContentMeta(); doc.add(new Field("keywords", metadata.KEYWORDS, Field.Store.YES, Field.Index.NO)); I'm using the nutch-0.8.jar for the Metadata Class and have used the jars of nutch to resolve any exceptions and also Lucene-2.0.0 While compiling this code, I'm getting the following error: A record version mismatch occurred. Expecting v1, found v118. Any help would be much appreciated. Regards, Aditya Gollakota Support Engineer | CustomWare Asia Pacific | www.customware.net T: +61 2 9900 5742 | F: +61 2 9475 0100 | M: +61 405 033 951 E: aditya.gollakota@customware.net ------=_NextPart_000_05D1_01C6D82E.9B8C5B80--