lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Leir <rl...@leirtech.com>
Subject Re: ClassicTokenizer
Date Wed, 10 Jan 2018 21:27:02 GMT
Shawn
I did not express that clearly. 
The reference guide says "The Classic Tokenizer preserves the same behavior as the Standard
Tokenizer of Solr versions 3.1 and previous. "

So I am curious to know why they changed StandardTokenizer after 3.1 to break on hyphens,
when it seems to me to work better the old way?
Thanks
Rick

On January 9, 2018 7:07:59 PM EST, Shawn Heisey <apache@elyograg.org> wrote:
>On 1/9/2018 9:36 AM, Rick Leir wrote:
>> A while ago the default was changed to StandardTokenizer from
>ClassicTokenizer. The biggest difference seems to be that Classic does
>not break on hyphens. There is also a different character pr(mumble). I
>prefer the Classic's non-break on hyphens.
>
>To have any ability to research changes, we're going to need to know
>precisely what you mean by "default" in that statement.
>
>Are you talking about the example schemas, or some kind of inherent
>default when an analysis chain is not specified?
>
>Probably the reason for the change is an attempt to move into the
>modern
>era, become more standardized, and stop using old/legacy
>implementations.  The name of the new default contains the word
>"Standard" which would fit in with that goal.
>
>I can't locate any changes in the last couple of years that change the
>classic tokenizer to standard.  Maybe I just don't know the right place
>to look.
>
>> What was the reason for changing this default? If I understand this
>better I can avoid some pitfalls, perhaps.
>
>If you are talking about example schemas, then the following may apply:
>
>Because you understand how analysis components work well enough to even
>ask your question, I think you're probably the kind of admin who is
>going to thoroughly customize the schema and not rely on the defaults
>for TextField types that come with Solr.  You're free to continue using
>the classic tokenizer in your schema if that meets your needs better
>than whatever changes are made to the examples by the devs.  The
>examples are only starting points, virtually all Solr installs require
>customizing the schema.
>
>Thanks,
>Shawn

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 
Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message