lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cutt...@lucene.com
Subject RE: PhrasePrefixQuery.java and MultipleTermPositions.java
Date Wed, 15 May 2002 17:49:35 GMT
This looks great!

The MultipleTermPositions implementation could be made more efficient if it:
  1. avoided constructing an Integer for each position
  2. avoided using a SortedSet to collect positions

These could be achieved by changing your PriorityQueue to be sorted by not
just document, but also by position within document, as in PhraseQueue.

Another optimization would be to use PriorityQueue adjustTop() instead of
pop() and put() -- it's twice as fast.

After these changes, a typical call to MultipleTermPositions.next() would
consist of a call to PriorityQueue.top(), a call to TermPositions.next(),
and a call to PriorityQueue().adjustTop().

I hope this makes sense.

Doug  

> -----Original Message-----
> From: Anders Nielsen
> [mailto:anders.at.visator.com@cutting.at.lucene.com]
> Sent: Wednesday, May 15, 2002 5:23 AM
> To: dcutting@grandcentral.com
> Subject: PhrasePrefixQuery.java and MultipleTermPositions.java
> 
> 
> Hello all,
> 
> Attached are two classes that handles Phrase queries of the 
> form "microsoft
> app*" where app* is supposed to match all words starting with "app".
> 
> It's also possible to handle queries where the prefix-term(s) 
> are in the
> middle of the phrase like so: "Microsoft* app*"
> 
> ---
> 
> PhrasePrefixQuery
> 
> PhrasePrefixQuery.java is a generalized version of 
> PhraseQuery.java, with an
> added method add(Term[]). 
> 
> To use this class, to search for the phrase "Microsoft app*" first use
> add(Term) on the term "Microsoft", then find all terms that 
> has "app" as
> prefix using IndexReader.terms(Term), and use 
> PhrasePrefixQuery.add(Term[]
> terms) to add them to the query.
> 
> Known Issues: the method toString() assumes that the first 
> term in a array
> of terms is the prefix for the whole array. That might not 
> necessarily be
> so.
> 
> 
> MultipleTermPositions
> 
> MultipleTermPositions.java is a class that implements the 
> TermPositions
> interface, and behaves like a single TermPosition would do 
> iterating through
> <doc, freq, <pos1, pos2, .. , posn>> tuples, but it handles multiple
> TermPositions at once by keeping them in a queue. 
> 
> Using this class, it was easy to write PhrasePrefixQuery reusing the
> ExactPhraseScorer and SloppyPhraseScorer directly.
> 
> Known Issues: Doesn't fully implement the TermDocs interface, leaving
> read(int[], int[]) and seek(Term) unsupported.
> 
> 
> Other Comments
> 
> It would possibly be a good idea for IndexReader to have a method
> IndexReader.termPositions(Term[]) which could return a
> MultipleTermPositions-object. But for now the constructor takes an
> IndexReader and an array of Terms.
> 
> Also, to fully integrate this into Lucene, code would have to 
> be added to
> QueryParser that handles queries of this type.
> 
> 
> regards,
> Anders Nielsen
> 
> 

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message