lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen" <cdor...@gmail.com>
Subject Re: Payloads and tokenizers
Date Thu, 14 Aug 2008 10:17:08 GMT
IIRC first versions of patches that added payloads support had this notion
of payload by field rather than by token, but later it was modified to be by
token only.

I have seen two code patterns to add payloads to tokens.

The first one created the field text with a reserved separator/delimiter
which was later identified by the analyzer who separated the payload part
from the token part, created the token and set the payload.

The other pattern was to create a field with a TokenStream. Can be done only
for non storable fields. Here you can create the token in advance, and you
have a SingleTokenStream (I think this is how it is called) to wrap it in
case it is a single token. Since the token is created in advance, there's no
analysis going on, and you can set the payload of that token on the spot.I
prefer this pattern - more efficient and elegant.

Doron

On Thu, Aug 14, 2008 at 6:14 AM, Antony Bowesman <adb@teamware.com> wrote:

> I started playing with payloads and have been trying to work out how to get
> the data into the payload
>
> I have a field where I want to add the following untokenized fields
>
> A1
> A2
> A3
>
> With these fields, I would like to add the payloads
>
> B1
> B2
> B3
>
> Firstly, it looks like you cannot add payloads to untokenized fields.  Is
> this correct?  In my usage, A and B are simply external Ids so must not be
> tokenized and there is always a 1-->1 relationship between them.
>
> Secondly, what is the way to provide the payload data to the tokenizer.  It
> looks like I have to add a List/Map of payload data to a custom Tokenizer
> and Analyzer, which is then consumed each "next(Token)".  However, it would
> be nice if, in my use case, I could use some kind of construct like:
>
> Document doc = new Document()
> Field f = new Field("myField", "A1", Field.Store.NO,
> Field.Index.UNTOKENIZED);
> f.setPayload("B1");
> doc.add(f);
>
> and avoid the whole unnecessary Tokenizer/Analyzer overhead and give
> support for payloads in untokenized fields.
>
> It looks like it would be trivial to implement in
> DocumentsWriter.invertField().  Or would this corrupt the Fieldable
> interface in an undesirable way?
>
> Antony
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message