lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jérôme Etévé" <>
Subject Re: Different tokenizing algorithms for the same stream
Date Fri, 07 Nov 2008 11:13:31 GMT

  I think you could implement your personalized tokenizer in a way it
changes its behaviour after it has delivered X tokens.

This implies a new tokenizer instance is build from the factory for
every string analyzed, which I believe is true.

Can this be confirmed ?

Cheers !


On Thu, Nov 6, 2008 at 11:08 PM, Yuri Jan <> wrote:
> Hello all,
> I'm trying to implement a tokenizer that will behave differently on
> different parts of the incoming stream.
> For example, for the first X words in the stream I would like to use one
> tokenizing algorithm, while for the rest of the stream a different
> tokenizing algorithm will be used.
> What is the best way to implement that?
> Where should I store this stream-related data?
> Thanks,
> Yuri

Jerome Eteve.

Chat with me live at

View raw message