lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pradeep Singh <pksing...@gmail.com>
Subject Re: Storing additional Metadata with Fields
Date Thu, 14 Oct 2010 10:29:43 GMT
Payload!!

2010/10/14 Christoph Hermann <hermann@informatik.uni-freiburg.de>

> Hi,
>
> is there a way to store additional metadata with fields?
>
> My Problem is as follows:
> I'm extracting extended html with tika. This extended html contains
> references
> to pages, x,y values of the text etc. I want to be able to retrieve those
> values when text was found while searching.
>
> So when creating the Document, i'm storing a Field for every part of the
> texts
> content of the document i'm currently indexing (lets call it "content").
>
> Example:
> I have the following content:
> <html><body>
> <span page="1" x="1", y="1">This is a very</span>
> <span page="1" x="1", y="2">interesting text.</span>
> <span page="2" x="1", y="1">This is boring text</span>
> </body></html>
>
> So i would store the following:
>
> doc.add(new Field("content", "This is a very", Field.Store.YES,
> Field.Index.YES));
> doc.add(new Field("content", "interesting text", Field.Store.YES,
> Field.Index.YES));
> doc.add(new Field("content", "This is boring text", Field.Store.YES,
> Field.Index.YES));
>
> Is there any way to include the page,x,y values in there?
> I'd like to display the page when retrieving the results.
>
> I thought about storing the same field twice and adding the page,x,y values
> at
> the beginning of the Field and then when retrieving the field extract those
> values, but maybe theres a better way?
>
> regards
> Christoph Hermann
>
> --
> Christoph Hermann
> Institut für Informatik
> Tel: +49 761-203-8171 Fax: +49 761-203-8162
> e-mail: hermann@informatik.uni-freiburg.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message