lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: too many hits - OutOfMemoryError
Date Thu, 29 May 2003 16:35:47 GMT
Ype Kingma wrote:
> The source of the problem is with the wildcards, so wouldn't be better
> to enforce a max. nr of expanded terms on these types of queries?
> That would allow finer control than on 'top level'.

That would provide more flexibility, but also more complexity.  There 
are three types of query that expand into BooleanQuery: FuzzyQuery, 
PrefixQuery, and WildcardQuery.  FuzzyQuery and WildcardQuery share a 
base class (MultiTermQuery), so they could be controlled by a single 
parameter, or one could make the parameter specific to each, or both. 
PrefixQuery would need a separate parameter.

My inclination is to first add the top-level parameter to BooleanQuery 
that limits all of these.  Then, if finer-grained control is desired, we 
could add more parameters.

> Also it would be possible to interact when the number of expanded
> terms grows out of control: ie. does the user really want 
> all these expanded terms, or would the user prefer to select
> some of the exanded terms?

That's an interesting thought.  What criteria would you use for 
selection?  One might limit the expansion to the more frequent terms. 
Do folks think that would be useful?  Is someone interested in 
implementing it?

My hunch is that most queries that expand to large numbers of terms are 
not useful queries.  They're also very slow, and many (most?) users 
might not wait for results anyway.  I think it's better to get an error 
message up front indicating that the query is too vague.

> I realize such interaction features are not needed for the avarage
> user, so the only thing I'd like to have is that Lucene allows for
> adding such features without needing to move Lucene functionality
> though it's class hierarchy.

Lucene allows for adding whatever features folks wish to contribute!  So 
if you have a concrete idea for a term expansion API, or, better yet, 
and implementation, please send it.

> OTOH a 'top level' control for a max. nr of clauses wouldn't hurt:
> one could always set it very high, not bother about it there,
> and leave the finer control to the wildcard query terms.

Exactly my thoughts.  The first thing to add is the top-level control, 
then, if and when folks have good ideas for lower-level control, we can 
consider adding those.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message