lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konrad Scherer <bcdh...@uottawa.ca>
Subject Re: New PhrasePrefixQuery.java
Date Wed, 20 Nov 2002 19:46:46 GMT

>
>I don't like extending Term.  An instance of a subclass should make sense 
>anywhere its base class is, and that is not really the case here.   A 
>WildcardTerm should not in general be passed to IndexReader methods, 
>etc.  It looks like you've hacked around this, so that it won't actually 
>crash, but this doesn't strike me as an appropriate use of subclassing.

I agree that it wasn't very elegant.

>I think it would be good to get this functionality into the Query 
>parser.  There is currently a gap between what is trivially available in 
>the query parser (strings with wildcard characters) and the 
>PhrasePrefixQuery API (an array of terms).  What it seems to me is needed 
>is just a utility method somewhere that expands a wildcarded string into 
>an array of terms.  This is probably best done in 
>PhrasePrefixQuery.scorer, when an IndexReader is available.  So the 
>approach I would suggest is extending the API of PhrasePrefixQuery with a 
>method like:
>   PhrasePrefixQuery.addTermPrefix(Term term);
>or
>   PhrasePrefixQuery.addWildcardTerm(Term term);
>where the term.text() contains either a term prefix or a wildcard 
>pattern.  Then, in the scorer() implementation this can be expanded. 
>PhrasePrefixQuery would then need to do some bookkeeping to identify which 
>terms need expansion.
>
>Does this make sense?
Yes it makes sense, but there is a problem. To expand a wildcard, an 
IndexReader is necessary. I choose the prepare method because then the 
wildcard term can be expanded before the function sumOfSquaredWeights is 
called. This function required the wildcard term already expanded. The 
relevant code follows:

Term[] terms = ((Term)o).getTerms();
for (int j=0; j<terms.length; j++) {
     _idf += searcher.getSimilarity().idf(terms[j], searcher);
}
I must admit to not understanding the weighting system at all == I haven't 
taken the time to think about it yet. Is it necessary to have all the terms 
for the weighting system to work? It would be strange to expand the 
wildcard within this function even if it were possible to retrieve an 
IndexReader from the IndexSearcher. If the math can be redone to avoid 
needing the expansion of the wildcard term then I will create a new version 
of PhrasePrefixQuery that will expand the term within the scorer. That 
would do away with WildcardTerm (and changes to Term) entirely.
Thank you

Konrad


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message