lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <...@teamware.com>
Subject Re: Payloads and tokenizers
Date Fri, 15 Aug 2008 01:34:59 GMT
Thanks for your comments Doron.  I found the earlier discussions on the dev list 
(21/12/06), where this issue is discussed - my use case is similar to Nadav Har'El.

Implementing payloads via Tokens explicitly prevents the use of payloads for 
untokenized fields, as they only support field.stringValue().  There seems no 
way to override this.

My field is currently stored, so the tokenStream approach you suggested, 
(Lucene-580) will not work as it's theoretically only for non-stored fields.  In 
practice, I expect I can create a stored/indexed Field with a dummy string 
value, then use setValue(TokenStream).  At least I can have stored fields with 
Payloads using the analyzer/tokenStream route.  Is this illegal?

What if the Fieldable had a tokenValue(), in addition to the existing 
stream/string/binary/reader values, which could be used for untokenized fields 
and used in invertField()?

I'd rather stick with core Lucene than start making proprietary changes, but it 
seems I can't quite get to where I want to be without some quite cludgy code for 
a very simple use case :(

Antony



Doron Cohen wrote:
> IIRC first versions of patches that added payloads support had this notion
> of payload by field rather than by token, but later it was modified to be by
> token only.
> 
> I have seen two code patterns to add payloads to tokens.
> 
> The first one created the field text with a reserved separator/delimiter
> which was later identified by the analyzer who separated the payload part
> from the token part, created the token and set the payload.
> 
> The other pattern was to create a field with a TokenStream. Can be done only
> for non storable fields. Here you can create the token in advance, and you
> have a SingleTokenStream (I think this is how it is called) to wrap it in
> case it is a single token. Since the token is created in advance, there's no
> analysis going on, and you can set the payload of that token on the spot.I
> prefer this pattern - more efficient and elegant.
> 
> Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message