lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Tignor <ctig...@thinkmap.com>
Subject Re: recovering payload from fields
Date Fri, 26 Feb 2010 20:45:22 GMT
Hello,

To my knoweldge, the character position of the tokens is not preserved by
Lucene - only the ordinal postion of token's within a document / field is
preserved.  Thus you need to store this character offset information
separately, say, as Payload data.

best,

C>T>

On Fri, Feb 26, 2010 at 3:41 PM, Christopher Condit <condit@sdsc.edu> wrote:

> I'm trying to store semantic information in payloads at index time. I
> believe this part is successful - but I'm having trouble getting access to
> the payload locations after the index is created. I'd like to know the
> offset in the original text for the token with the payload - and get this
> information for all payloads that are set in a Field even if they don't
> relate to the query. I tried (from the highlighting filter):
> TokenStream tokens = TokenSources.getTokenStream(reader, 0, "body");
>  while (tokens.incrementToken()) {
>    TermAttribute term = tokens.getAttribute(TermAttribute.class);
>    if (toker.hasAttribute(PayloadAttribute.class)) {
>      PayloadAttribute payload =
> tokens.getAttribute(PayloadAttribute.class);
>      OffsetAttribute offset = toker.getAttribute(OffsetAttribute.class);
>    }
>  }
> But the OffsetAttribute never seems to contain any information.
> In my token filter do I need to do more than:
> offsetAtt = addAttribute(OffsetAttribute.class);
> during construction in order to store Offset information?
>
> Thanks,
> -Chris
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message