lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <>
Subject [jira] [Commented] (SOLR-2519) Improve the defaults for the "text" field type in default schema.xml
Date Mon, 16 May 2011 18:28:47 GMT


Hoss Man commented on SOLR-2519:

bq. Also: existing users would be unaffected by this? They've already copied over / edited
their own schema.xml? This is mainly about new users?

The trap we've seen with this type of thing in the past (ie: the numeric fields) is that people
who tend to use the example configs w/o changing them much refer to the example field types
by name when talking about them on the mailing list, not considering that those names can
have differnet meanings depending on version.

if we make radical changes to a {{<fieldType/>}} but leave the name alone, it could
confuse a lot of people, ie: "i tried using the 'text' field but it didn't work"; "which version
of solr are you using?"; "Solr 4.1"; "that should work, what exactly does your schema look
like"; "..."; "that's the schema from 3.6"; "yeah, i started with 3.6 nad then upgraded to
4.1 later", etc...

Bottom line: it's less confusing to *remove* {{<fieldType/>}} and add new ones with
new names then to make radical changes to existing ones.

> Improve the defaults for the "text" field type in default schema.xml
> --------------------------------------------------------------------
>                 Key: SOLR-2519
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.2, 4.0
>         Attachments: SOLR-2519.patch
> Spinoff from:
> The text fieldType in schema.xml is unusable for non-whitespace
> languages, because it has the dangerous auto-phrase feature (of
> Lucene's QP -- see LUCENE-2458) enabled.
> Lucene leaves this off by default, as does ElasticSearch
> (http://
> Furthermore, the "text" fieldType uses WhitespaceTokenizer when
> StandardTokenizer is a better cross-language default.
> Until we have language specific field types, I think we should fix
> the "text" fieldType to work well for all languages, by:
>   * Switching from WhitespaceTokenizer to StandardTokenizer
>   * Turning off auto-phrase

This message is automatically generated by JIRA.
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message