lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: Modifying a stored field after analyzing it?
Date Fri, 10 Jul 2009 19:38:14 GMT
On Fri, Jul 10, 2009 at 3:42 PM, solrcoder <> wrote:

> markrmiller wrote:
> >
> > When you specify a custom UpdateProcessor chain, you will normally make
> > the
> > RunUpdateProcessor the last processor in the chain, as it will add the
> doc
> > to Solr.
> > Rather than using the built in RunUpdateProcessor though, you could
> simply
> > specify your own UpdateProcessor as the last one.
> >
> So, to make sure I understand you:
> 1) As of today, if I were to drop in a custom RequestUpdateProcessor that
> modeled RunUpdateProcessor but did some Document modification, it wouldn't
> help, because today Document fields can't support stored form tokenizing.
> Modifying the fields would just strip the data that the tokenizer would
> need
> to index properly.
> 2) The patch Yonik submitted, which I read and poorly understood out of
> context, will allow tokenization of the stored form in addition to the
> indexed form, so that from an input text "A" I can produce stored form "B"
> and indexed form "C".
> Yes?
> Again, I didn't understand the patch well, but it looked to me like it only
> provided the ability to say "the tokenizer I'm using on the indexed form
> should be used on the stored form as well."  However, I'll actually need
> *separate* tokenization -- the field
>   one two three four [MARKER] oneprime twoprime threeprime fourprime
> essentially needs the first part stripped for indexing, and the second part
> stripped for storing.  Once Yonik's patch goes live, how would I tell my
> tokenizer to behave differently for the stored form vs the indexed form?
> I'm sure I'm missing something; sorry for the confusion.
> --
> View this message in context:
> Sent from the Solr - User mailing list archive at
Yonik's patch makes it so that you can supply the TokenStream straight to
the field and still store an *independent* text value in a stored field.
When building the Lucene Document, when adding the field, you would add the
raw TokenStream and then use setValue to set the stored text.

- Mark

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message