lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-2119) IndexSchema should log warning if <analyzer> is declared with charfilter/tokenizer/tokenfiler out of order
Date Tue, 14 Sep 2010 23:12:34 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909511#action_12909511
] 

Robert Muir commented on SOLR-2119:
-----------------------------------

{quote}
There seems to be a segment of hte user population that has a hard time understanding the
distinction between a charfilter, a tokenizer, and a tokenfilter - while we can certianly
try to improve the documentation about what exactly each does, and when they take affect in
the analysis chain, one other thing we should do is try to educate people when they constuct
their <analyzer> in a way that doesn't make any sense.
{quote}

I think we should do both, this is a great idea.

{quote}
(we could easily make such a situation fail to initialize, but i'm not convinced that would
be the best course of action, since some people may have schema's where they have declared
a charFilter or tokenizer out of order relative to their tokenFilters, but are still getting
"correct" results that work for them, and breaking their instance on upgrade doens't seem
like it would be productive)
{quote}

I would prefer a hard error. I think someone who doesnt understand what tokenizers and filters
do, likely isnt looking at their log files either.

In my opinion, Solr should be more picky about its configuration. Often times if i havent
had enough sleep i will type tokenFilter instead of filter, and it simply ignores it completely,
instead of an error.

and i can't be the only one that does this, its not obvious that tokenizer = Tokenizer, charFilter
= CharFilter, analyzer = Analyzer, but filter = TokenFilter.


> IndexSchema should log warning if <analyzer> is declared with charfilter/tokenizer/tokenfiler
out of order
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-2119
>                 URL: https://issues.apache.org/jira/browse/SOLR-2119
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Hoss Man
>
> There seems to be a segment of hte user population that has a hard time understanding
the distinction between a charfilter, a tokenizer, and a tokenfilter -- while we can certianly
try to improve the documentation about what exactly each does, and when they take affect in
the analysis chain, one other thing we should do is try to educate people when they constuct
their <analyzer> in a way that doesn't make any sense.
> at the moment, some people are attempting to do things like "move the Foo <tokenFilter/>
before the <tokenizer/>" to try and get certain behavior ... at a minimum we should
log a warning in this case that doing that doesn't have the desired effect
> (we could easily make such a situation fail to initialize, but i'm not convinced that
would be the best course of action, since some people may have schema's where they have declared
a charFilter or tokenizer out of order relative to their tokenFilters, but are still getting
"correct" results that work for them, and breaking their instance on upgrade doens't seem
like it would be productive)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message