lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Davies <ben.dav...@gmail.com>
Subject Re: Field Analyzers: which values are indexed?
Date Wed, 13 Apr 2011 15:07:36 GMT
Thanks both for your replies

Eric,
Yep, I use the Analysis page extensively, but what I was directly looking
for was whether all of only the last line of values given by the analysis
page, where eventually indexed.
I think we've concluded it's only the last line.

Cheers,
Ben

On Wed, Apr 13, 2011 at 2:41 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> CharFilterFactories are applied to the raw input before tokenization.
> Each token output from the tokenization is then sent through
> the rest of the chain.
>
> The Analysis page available from the Solr admin page is
> invaluable in answering in great detail what each part of
> an analysis chain does.
>
> TokenFilterFactories are applied to each token emitted from
> the tokenizer, and this includes the similar
> PatternReplaceFilterFactory. The difference is that the
> PatternReplaceCharFilterFactory is applied before tokenization
> to the entire input stream and PatternReplaceFilterFactory
> is applied to each token emitted by the tokenizer.
>
> And to make it even more fun, you can do both!
>
> Best
> Erick
>
> On Wed, Apr 13, 2011 at 8:14 AM, Ben Davies <ben.davies@gmail.com> wrote:
>
> > Hi there,
> >
> > Just a quick question that the wiki page (
> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) didn't seem
> > to
> > answer very well.
> >
> > Given an analyzer that has  zero or more Char Filter Factories, one
> > Tokenizer Factory, and zero or more Token Filter Factories, which
> value(s)
> > are indexed?
> >
> > Is every value that is produced from each char filter, tokenizer, and
> > filter
> > indexed?
> > Or is the only the final value after completing the whole chain indexed?
> >
> > Cheers,
> > Ben
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message