lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cedric Ho" <cedric...@gmail.com>
Subject Re: How to pass additional information into Similarity.scorePayload(...)
Date Fri, 15 Feb 2008 08:45:58 GMT
Hi Paul,

Do you mean the following?

e.g. to index this: "first second third <paragraphBorder> forth fifth six"

originally it would be indexed as:
(first,0) (second,1) (third,2) (forth,3) (fifth,4) (six,5)

now it will be:
(first,0) (second,0) (third,0) (forth,1) (fifth,1) (six,1)

Then those Query classes that depends on the positional information
(PhraseQuery, SpanQueries) won't work then? unfortunately I'll need
those Query classes as well.

Cedric


>  For each word in the input stream make sure that the position
>  at which it is indexed in an extra field is the same as the paragraph
>  number. That will involve only allowing a position increment at
>  a paragraph border during indexing.
>  Call this extra field the paragraph field if you will.
>
>  Then, during search, search for a Term in paragraph field, and
>  use the position from that field, i.e. the paragraph number
>  to find a weight for the found term.
>  Have a look at PhraseQuery on how to use term positions during
>  search. It computes relative positions, but it works on the absolute
>  positions that it gets from the index.
>
>  SpanFirstQuery also allows to do that, it's a bit more involved, but
>  in the end it works from the same absolute positions from the index.
>  The version at the jira issue will even allow to use the length of the
>  matching spans as the absolute paragraph number, which, in turn,
>  allows the use of a Similarity for the paragraph weights [10/5/2].
>
>  There is nothing special about indexed term positions; any term can
>  be indexed at any position in a field. Lucene will take advantage of
>  the incremental nature of positions by storing only compressed
>  differences of positions in the index, but during search the original
>  positions are directly available, You can do the same with payloads,
>  but why reimplement something that is already available?
>
>  Payloads have better uses than positional info, for one they are
>  great to avoid disjunctions. For example for verbs, one could
>  index only the stem and use a payload for the actual inflected
>  form (singular/plural, past/present, first/second/third person, etc).
>
>  Regards,
>  Paul Elschot
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message