lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <t...@hypermall.net>
Subject Re: Query Term Collector (was: Re: New highlighter package available)
Date Fri, 03 Oct 2003 00:53:30 GMT
On Thursday 02 October 2003 15:15, Otis Gospodnetic wrote:
> Korfut, Tatu (if you're watching),
>
> I'm trying to understand what this term collector idea is all about, so
> I looked online for some of your previous discussions on this topic
> from March 2003.  So this patches that both of you sent to lucene-dev
> at some point both implement a term collector.
> What terms do your term collectors collect, could you explain that in
> simple terms, and with an example, please? (I almost broke one of the
> walls in my apartment, when I accidentally smacked it with my head 10
> minutes ago)
>
> If I make a BooleanQuery: Laurel AND Hardy
> What is I make a WildcardQuery: Comed*
> What terms would your collector collect and return in each case?
>
> I only saw Tatu's diff to the existing classes, and noticed that his
> solution includes 5-6 new classes.
> (http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg02815/querySr
>c.zip)

Right. New classes were for iterating over terms; changes to Lucene core were
fairly minor (mostly to allow using visitor pattern to traverse query 
structure and to call term collector).

In my code I created 3 iterator interfaces that allow traversing query 
structure, as well as both "logical" terms (ones Query has, before expansion 
or rewrite), and "physical" terms; terms that actually exist in index 
(expanded query terms). For some queries (simple term query, phrase query) 
there is no difference; for others there are (wild card queries). Now, if I 
remember correctly, iteration happens at 2 level; main level being queries, 
and then for each query, one can iterate through its terms. This is necessary 
to do highlightings where different original query components (different 
phrases etc) need to be handled differently (different colouring etc).
My main goal was to try to show uniform interface, so that application code 
didn't have to worry about different kinds of queries, while still having 
full power to check out Query objects if it chooses to.

So, at high level the idea was something like:

QueryIterator qi = query.iterator();
while (qi.hasNext()) {
// Loop over separate "term query" (queries that directly contain terms)
   Query q = qi.nextQuery(); // If query itself is needed?
  // Need to know properties of the clause that contains term? (for - / +)
  boolean optional = qi.isOptional();
  boolean reqd = qi.isRequired();
  boolean prohib = qi.isProhibited();
 // Iterate over logical (base) terms:
  TermIterator logicalTermIt = qi.baseTermIterator();
  while (logicalTermIt.hasNext()) {
     Term t = logicalTermIt.nextTerm(); // Could display original terms etc
  }
  // Need an IndexReader, to expand Terms
  TermIterator actualTermIt = qi.actualTermIterator(indexReader);
  while (actualTermIt.hasNext()) {
    // ... Collect all actual terms to match in doc displayed?
  }
}

Example probably doesn't make much sense alone, but code would allow for
actual highlighting support, as well as fairly generic access to queries, if 
one wants display query structure or such.

I haven't really had time since then (nor immediate need) to work on getting 
support for highlighting, so I'm not really pushing my patches, if/when 
others have more current ones... but if anyone's interested in such 
approaches, I have the patch zip file (plus archives likely have it).

I also didn't realize back then that query rewrite (which I believe did exist 
even then) could be used to simplify the task... interesting approach, and 
I'm glad it works well enough to allow for highlighting to work. I haven't 
checked out Mark's patches but I think it's great someone took time to 
implement this often requested feature.

-+ Tatu +-


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message