Doug,
On Thursday 29 May 2003 09:35, Doug Cutting wrote:
> Ype Kingma wrote:
> > The source of the problem is with the wildcards, so wouldn't be better
> > to enforce a max. nr of expanded terms on these types of queries?
> > That would allow finer control than on 'top level'.
>
> That would provide more flexibility, but also more complexity. There
> are three types of query that expand into BooleanQuery: FuzzyQuery,
> PrefixQuery, and WildcardQuery. FuzzyQuery and WildcardQuery share a
> base class (MultiTermQuery), so they could be controlled by a single
> parameter, or one could make the parameter specific to each, or both.
> PrefixQuery would need a separate parameter.
>
> My inclination is to first add the top-level parameter to BooleanQuery
> that limits all of these. Then, if finer-grained control is desired, we
> could add more parameters.
>
> > Also it would be possible to interact when the number of expanded
> > terms grows out of control: ie. does the user really want
> > all these expanded terms, or would the user prefer to select
> > some of the exanded terms?
>
> That's an interesting thought. What criteria would you use for
> selection? One might limit the expansion to the more frequent terms.
> Do folks think that would be useful? Is someone interested in
> implementing it?
I think the actual interaction needed for term selection by a users
should be left out of Lucene. That leaves an API for subset
selection from a set of terms, which should be straightforward.
Limiting to more frequent terms is very dependant on the the users'
intention. Eg. when one needs high recall, it's not advisable.
> My hunch is that most queries that expand to large numbers of terms are
> not useful queries. They're also very slow, and many (most?) users
> might not wait for results anyway. I think it's better to get an error
> message up front indicating that the query is too vague.
>
> > I realize such interaction features are not needed for the avarage
> > user, so the only thing I'd like to have is that Lucene allows for
> > adding such features without needing to move Lucene functionality
> > though it's class hierarchy.
>
> Lucene allows for adding whatever features folks wish to contribute! So
> if you have a concrete idea for a term expansion API, or, better yet,
> and implementation, please send it.
The implicit good news for me is that you don't think of such features
as infeasible in combination with Lucene.
As I said, I haven't even looked at the actual details of term expansion.
I happen to be familiar with a query language in which user selection
of expanded terms is possible. It also requires prefix queries to at have
least 3 characters before the first truncation in order to limit the term
expansion.
Your mention of selecting terms with high frequency brings me to
another point.
Terms that inadvertantly have a low document frequency (spelling
errors for example), get a term relevancy in query execution that
is higher than they actually deserve.
This problem surfaces when term expansion results in such terms.
Is there a way in Lucene to give all expanded terms the same relevancy?
The problem also surfaces when two synonyms
have a very different document frequency and these synonyms
are used together in a query. In this situation one can compensate by
using appropriate query term weights, but a special OR operator
for synonyms might be preferable.
Kind regards,
Ype Kingma
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
|