lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: inconsistency/performance trap of empty terms
Date Fri, 29 Oct 2010 00:20:43 GMT
On Thu, Oct 28, 2010 at 7:59 PM, Chris Hostetter
<hossman_lucene@fucit.org> wrote:
>
> : Anyway, I think its possible other users might be in this same
> : situation, with slow performance, and not even realizing it yet...
> : Obviously they can fix this if they go and add LengthFilter, but
> : should we be doing something different?
>
> On one level,  ithink a big improvement might just be to start encouraging
> more use of LengthFilter with min=1 at the end of analyzers by including
> it at the end of more "example" field types -- we should probably end
> every analyzer with that and RemoveDuplicatesTokenFilterFactory as a
> general pattern.

why not just discard them completely in say, indexer/queryparser ?

>
> How individual Tokenizers and TokenFilters deal with empty tokens seems
> like something that should be cases by case -- the Ngram classes should
> allow/create them if the "min" value is 0, the pattern based classes
> should create them if the pattern matches and empty string, etc....

why should they create them? is there some use case for the empty term
that you have found (because i can't think of a use case, except
making your search engine slower!)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message