lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: New PhrasePrefixQuery.java
Date Wed, 20 Nov 2002 19:03:42 GMT
Konrad Scherer wrote:
> I have modified QueryParser.jj and PhrasePrefixQuery.java to allow 
> wildcard searches within phrases. This turned out to be a very involved 
> change going through a few revisions. I have tried to make the changes 
> as clean as possible.

Thanks for taking the time to work on this.  I hope your patience continues.

 >Some points
> 1) I created a WildcardTerm class which extends Term. Originally Term 
> was final. My changes shouldn't affect anyone unless there is a reason 
> Term must remain final which I have not noticed.

I don't like extending Term.  An instance of a subclass should make 
sense anywhere its base class is, and that is not really the case here. 
   A WildcardTerm should not in general be passed to IndexReader 
methods, etc.  It looks like you've hacked around this, so that it won't 
actually crash, but this doesn't strike me as an appropriate use of 
subclassing.

> 2) PhrasePrefixQuery.java has been completely rewritten. 

And you added meaningful comments!  Bravo!

 > Extending Term
 > helped simplify this class considerably. A PhrasePrefixQuery is now a
 > vector of Terms (or WildcardTerms). The wildcard Terms are expanded
 > though the prepare() call from Query.

Unfortunately, the prepare() method has not proven to be a great way to 
do things.  The problem is that, with MultiSearcher, it is called 
multiple times, once for each underlying IndexReader that is searched. 
If, for example, MultiSearcher spawned a thread to search each of the 
sub-indexes, then when prepare() is called in each thread it would 
modify the terms in the query in different ways, and they would 
conflict.  You could add some synchronization code into MultiTermQuery, 
but it's really better if all query invocation state is either on the 
stack or in the Scorer.  I think just about every use of prepare() has 
resulted in a bug.  Long term, I this method should probably be removed.

The previous implementation managed without using prepare.

I think it would be good to get this functionality into the Query 
parser.  There is currently a gap between what is trivially available in 
the query parser (strings with wildcard characters) and the 
PhrasePrefixQuery API (an array of terms).  What it seems to me is 
needed is just a utility method somewhere that expands a wildcarded 
string into an array of terms.  This is probably best done in 
PhrasePrefixQuery.scorer, when an IndexReader is available.  So the 
approach I would suggest is extending the API of PhrasePrefixQuery with 
a method like:
   PhrasePrefixQuery.addTermPrefix(Term term);
or
   PhrasePrefixQuery.addWildcardTerm(Term term);
where the term.text() contains either a term prefix or a wildcard 
pattern.  Then, in the scorer() implementation this can be expanded. 
PhrasePrefixQuery would then need to do some bookkeeping to identify 
which terms need expansion.

Does this make sense?

Doug



--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message