lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Hermann <herm...@informatik.uni-freiburg.de>
Subject Writing an Analyzer for storing and retrieving a payload (was: Storing additional Metadata with Fields)
Date Fri, 15 Oct 2010 14:13:39 GMT
Am Donnerstag, 14. Oktober 2010, 14:43:41 schrieb Christoph Hermann:

Hello,

> It seems Playload gets added to
> every term in the index, so in my case i would store the x,y and page
> values for every word and increase the index much more than i'd need.
> Any approach for preventing this?
> 
> And when searching, how can i access the payloads when displaying the
> result? I haven't found information on that so far.

Is there any example on how to use payloads?
And the above questions are still valid.

My current problem is that i've written a ContentHandler, that parses the 
extended html from tika and sets boost values on created fields, but it seems 
that i need to move all this to the Analyzer since using boosts on Fields with 
the same name has no real effect?
I.e.
add(new Field("contents","foo"))
add(new Field("contents","bar").setBoost(1.5f))

=> gets one "content" field with a common boost value?

If i'm correct, how would i proceed to achieve the desired effect?

Put all the HTML from the <body> (from tika) in one content field, and let the 
Analyzer do the work?

Is there an example of an Analyzer that uses playloads available somewhere?

regards
Christoph Hermann

-- 
Christoph Hermann
Institut für Informatik
Tel: +49 761-203-8171 Fax: +49 761-203-8162
e-mail: hermann@informatik.uni-freiburg.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message