lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: what is the offsets and payload in DocsAndPositionsEnum for ??
Date Mon, 19 Nov 2012 00:53:36 GMT
On Sun, Nov 18, 2012 at 12:09 PM, wgggfiy <wuqiu.reg@qq.com> wrote:
> I'm now studying lucene 4.0.
> 1, what is the startOffset and endOffset for ? is there a code example ?

These are set by the analyzer, to the start and end character offset
for this token (using the OffsetAttribute).  The offsets are used for
highlighting.

> 2, what is payload ? I know just a little about it, and it can be used for
> things like font weight, or XML enclosing tag.

It's an arbitrary per-token-position byte[] that you set during
analysis (using the PayloadAttribute).

> 3, I have a item like (lucene, 350, 450, 33.2, 2), where 350,450 is the
> offset of the term 'lucene', and 33.2 is a score, and 2 is some id, my
> question is how I can make it indexed ?
> my first idea is to relized my own posting list format, but is it possible
> to make it with the startOffset, endOffset and payload ?

You should probably encode them all into the payload; Lucene requires
that the offsets are "in order".

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message