lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yoav Caspi" <yoa...@gmail.com>
Subject Re: Different tokenizing algorithms for the same stream
Date Fri, 07 Nov 2008 15:33:50 GMT
Thanks, Jerome.

My problem is that in Token next(Token result) there is no information about
the location inside the stream.
I can read characters from the input Reader, but couldn't find a way to know
if it's the beginning of the input or not.

-J

On Fri, Nov 7, 2008 at 6:13 AM, Jérôme Etévé <jerome.eteve@gmail.com> wrote:

> Hi,
>
>  I think you could implement your personalized tokenizer in a way it
> changes its behaviour after it has delivered X tokens.
>
> This implies a new tokenizer instance is build from the factory for
> every string analyzed, which I believe is true.
>
> Can this be confirmed ?
>
> Cheers !
>
> Jerome.
>
>
> On Thu, Nov 6, 2008 at 11:08 PM, Yuri Jan <vaoyca@gmail.com> wrote:
> > Hello all,
> >
> > I'm trying to implement a tokenizer that will behave differently on
> > different parts of the incoming stream.
> > For example, for the first X words in the stream I would like to use one
> > tokenizing algorithm, while for the rest of the stream a different
> > tokenizing algorithm will be used.
> >
> > What is the best way to implement that?
> > Where should I store this stream-related data?
> >
> > Thanks,
> > Yuri
> >
>
>
>
> --
> Jerome Eteve.
>
> Chat with me live at http://www.eteve.net
>
> jerome@eteve.net
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message