lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <t...@hypermall.net>
Subject Iterators for collecting Terms from Queries
Date Fri, 14 Mar 2003 06:42:29 GMT
Ok, I finally finished the classes that allow easy traversal of Terms (base 
terms or actual terms) of a Query. Took longer than I expected, not because 
it's hard to collect terms, but because it was bit tricky to make it both 
intuitive to use and powerful to work in most cases.

The result is bit heavy in that I had to add a few new classes, but changes to 
existing classes were fairly minimal. And I think this should solve one of 
problems in writing highlighters.

I'd love to get feedback on implementation, plus of course if/when bugs are 
found I need to fix them.

Anyway, there are basically 3 ways to access Terms of a Query;

- Use SimpleTermCollector's collectBaseTerms(); this will fill a Collection 
with all base Terms (unexpanded Terms; wildcard query terms still contains 
"*" and "?" etc).
- Use ActualTermCollector's collectActualTerms() (need to pass an 
IndexReader); works like collectBaseTerms() but contains all actual terms 
(Terms expanded to Terms found in Index access using passed in IndexReader)
- Get a TermQueryIterator using ActualTermCollector's termQueryIterator() 
method.

In simplest cases first 2 methods are enough. However, if more information 
about Term context and type is needed, Iterator gives full access to most 
info you might want to know (you can check Query Term was contained in, 
whether Query is required/prohibited/optional).

I didn't yet add full test cases, but ActualTermCollector has main() method 
that does simple testing given user's input. It also shows how to traverse 
Query Terms using TermQueryIterator (and base/actual term iterators it can 
give).

-+ Tatu +-

ps. About attachments; zip file contains new classes contained in
  org.apache.lucene.search package, txt file contains patches taken
 from org/apache/lucene/.

Mime
View raw message