lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Hermann <herm...@informatik.uni-freiburg.de>
Subject Storing additional Metadata with Fields
Date Thu, 14 Oct 2010 10:17:23 GMT
Hi,

is there a way to store additional metadata with fields?

My Problem is as follows:
I'm extracting extended html with tika. This extended html contains references 
to pages, x,y values of the text etc. I want to be able to retrieve those 
values when text was found while searching.

So when creating the Document, i'm storing a Field for every part of the texts 
content of the document i'm currently indexing (lets call it "content").

Example:
I have the following content:
<html><body>
<span page="1" x="1", y="1">This is a very</span>
<span page="1" x="1", y="2">interesting text.</span>
<span page="2" x="1", y="1">This is boring text</span>
</body></html>

So i would store the following:

doc.add(new Field("content", "This is a very", Field.Store.YES, 
Field.Index.YES));
doc.add(new Field("content", "interesting text", Field.Store.YES, 
Field.Index.YES));
doc.add(new Field("content", "This is boring text", Field.Store.YES, 
Field.Index.YES));

Is there any way to include the page,x,y values in there?
I'd like to display the page when retrieving the results.

I thought about storing the same field twice and adding the page,x,y values at 
the beginning of the Field and then when retrieving the field extract those 
values, but maybe theres a better way?

regards
Christoph Hermann

-- 
Christoph Hermann
Institut für Informatik
Tel: +49 761-203-8171 Fax: +49 761-203-8162
e-mail: hermann@informatik.uni-freiburg.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message