Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 77864 invoked from network); 14 Oct 2010 10:18:04 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 14 Oct 2010 10:18:04 -0000 Received: (qmail 50728 invoked by uid 500); 14 Oct 2010 10:18:01 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 50528 invoked by uid 500); 14 Oct 2010 10:17:58 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 50520 invoked by uid 99); 14 Oct 2010 10:17:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Oct 2010 10:17:58 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [132.230.150.4] (HELO abacus.informatik.uni-freiburg.de) (132.230.150.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Oct 2010 10:17:52 +0000 Received: from hsi-kbw-078-042-207-153.hsi3.kabel-badenwuerttemberg.de ([78.42.207.153] helo=guschtel.localnet) by abacus.informatik.uni-freiburg.de with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.71) (envelope-from ) id 1P6KsK-0007Y4-PB for java-user@lucene.apache.org; Thu, 14 Oct 2010 12:17:29 +0200 From: Christoph Hermann To: java-user@lucene.apache.org Subject: Storing additional Metadata with Fields Date: Thu, 14 Oct 2010 12:17:23 +0200 User-Agent: KMail/1.13.5 (Linux/2.6.34-gentoo-r1; KDE/4.4.5; x86_64; ; ) Organization: =?utf-8?q?Albert-Ludwigs-Universit=C3=A4t?= Freiburg MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Message-Id: <201010141217.23568.hermann@informatik.uni-freiburg.de> Hi, is there a way to store additional metadata with fields? My Problem is as follows: I'm extracting extended html with tika. This extended html contains referen= ces=20 to pages, x,y values of the text etc. I want to be able to retrieve those=20 values when text was found while searching. So when creating the Document, i'm storing a Field for every part of the te= xts=20 content of the document i'm currently indexing (lets call it "content"). Example: I have the following content: This is a very interesting text. This is boring text So i would store the following: doc.add(new Field("content", "This is a very", Field.Store.YES,=20 =46ield.Index.YES)); doc.add(new Field("content", "interesting text", Field.Store.YES,=20 =46ield.Index.YES)); doc.add(new Field("content", "This is boring text", Field.Store.YES,=20 =46ield.Index.YES)); Is there any way to include the page,x,y values in there? I'd like to display the page when retrieving the results. I thought about storing the same field twice and adding the page,x,y values= at=20 the beginning of the Field and then when retrieving the field extract those= =20 values, but maybe theres a better way? regards Christoph Hermann =2D-=20 Christoph Hermann Institut f=FCr Informatik Tel: +49 761-203-8171 Fax: +49 761-203-8162 e-mail: hermann@informatik.uni-freiburg.de --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org