lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jérôme Etévé" <jerome.et...@gmail.com>
Subject Re: Different tokenizing algorithms for the same stream
Date Fri, 07 Nov 2008 11:13:31 GMT
Hi,

  I think you could implement your personalized tokenizer in a way it
changes its behaviour after it has delivered X tokens.

This implies a new tokenizer instance is build from the factory for
every string analyzed, which I believe is true.

Can this be confirmed ?

Cheers !

Jerome.


On Thu, Nov 6, 2008 at 11:08 PM, Yuri Jan <vaoyca@gmail.com> wrote:
> Hello all,
>
> I'm trying to implement a tokenizer that will behave differently on
> different parts of the incoming stream.
> For example, for the first X words in the stream I would like to use one
> tokenizing algorithm, while for the rest of the stream a different
> tokenizing algorithm will be used.
>
> What is the best way to implement that?
> Where should I store this stream-related data?
>
> Thanks,
> Yuri
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jerome@eteve.net

Mime
View raw message