lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anders Nielsen <and...@visator.com>
Subject RE: PhrasePrefixQuery.java and MultipleTermPositions.java
Date Wed, 15 May 2002 18:59:07 GMT
Regarding sorting by position: I made the MultipleTermPositions.next() pull
out all the positions from each TermPositions because there doesn't seem to
be a way to peek at the current position in a TermPositions object without
moving to the next position (it's like Heisenberg's uncertainty principle:
Observing the object, changes it's state). 

Alternatively to putting all the positions into a SortedSet I could either
make a TermPositions-wrapper (ala PhrasePosition) that keeps the next
position as a public variable and make the queue sort on both doc and
position, or the sorted list of positions could be made with an array of
ints instead of a SortedSet.

I think the latter is probably the most efficient, but I can't really tell
for sure unless I implement both and test. Any suggestions?

Regarding using adjustTop() instead of pop()-push(): sounds good, I'll make
that chance asap.


regards,
Anders Nielsen



-----Original Message-----
From: cutting@lucene.com
To: lucene-dev@jakarta.apache.org
Sent: 15-05-2002 19:49
Subject: RE: PhrasePrefixQuery.java and MultipleTermPositions.java

This looks great!

The MultipleTermPositions implementation could be made more efficient if
it:
  1. avoided constructing an Integer for each position
  2. avoided using a SortedSet to collect positions

These could be achieved by changing your PriorityQueue to be sorted by
not
just document, but also by position within document, as in PhraseQueue.

Another optimization would be to use PriorityQueue adjustTop() instead
of
pop() and put() -- it's twice as fast.

After these changes, a typical call to MultipleTermPositions.next()
would
consist of a call to PriorityQueue.top(), a call to
TermPositions.next(),
and a call to PriorityQueue().adjustTop().

I hope this makes sense.

Doug  

> -----Original Message-----
> From: Anders Nielsen
> [mailto:anders.at.visator.com@cutting.at.lucene.com]
> Sent: Wednesday, May 15, 2002 5:23 AM
> To: dcutting@grandcentral.com
> Subject: PhrasePrefixQuery.java and MultipleTermPositions.java
> 
> 
> Hello all,
> 
> Attached are two classes that handles Phrase queries of the 
> form "microsoft
> app*" where app* is supposed to match all words starting with "app".
> 
> It's also possible to handle queries where the prefix-term(s) 
> are in the
> middle of the phrase like so: "Microsoft* app*"
> 
> ---
> 
> PhrasePrefixQuery
> 
> PhrasePrefixQuery.java is a generalized version of 
> PhraseQuery.java, with an
> added method add(Term[]). 
> 
> To use this class, to search for the phrase "Microsoft app*" first use
> add(Term) on the term "Microsoft", then find all terms that 
> has "app" as
> prefix using IndexReader.terms(Term), and use 
> PhrasePrefixQuery.add(Term[]
> terms) to add them to the query.
> 
> Known Issues: the method toString() assumes that the first 
> term in a array
> of terms is the prefix for the whole array. That might not 
> necessarily be
> so.
> 
> 
> MultipleTermPositions
> 
> MultipleTermPositions.java is a class that implements the 
> TermPositions
> interface, and behaves like a single TermPosition would do 
> iterating through
> <doc, freq, <pos1, pos2, .. , posn>> tuples, but it handles multiple
> TermPositions at once by keeping them in a queue. 
> 
> Using this class, it was easy to write PhrasePrefixQuery reusing the
> ExactPhraseScorer and SloppyPhraseScorer directly.
> 
> Known Issues: Doesn't fully implement the TermDocs interface, leaving
> read(int[], int[]) and seek(Term) unsupported.
> 
> 
> Other Comments
> 
> It would possibly be a good idea for IndexReader to have a method
> IndexReader.termPositions(Term[]) which could return a
> MultipleTermPositions-object. But for now the constructor takes an
> IndexReader and an array of Terms.
> 
> Also, to fully integrate this into Lucene, code would have to 
> be added to
> QueryParser that handles queries of this type.
> 
> 
> regards,
> Anders Nielsen
> 
> 

--
To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message