lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw...@yahoo.co.uk
Subject Re: How to extract matching terms for a document given a query
Date Wed, 16 Jun 2004 21:31:07 GMT
Yes, highlighting multi-term queries does require a query.rewrite() call to expand those terms
before
calling the highlighter.
BUT, you could load the results documents into a temporary RAMDirectory and expand the query
by rewriting it 
against THAT instead of the original index - it would still produce the term expansions you
need. 
The trouble is that would require one pass of the analyzer over the full documents' text.
You would have to tokenize the text again when you called the highlighter unless you wrote
code to cache the 
results of the token stream created while indexing the results into your RamDirectory. This
pre-tokenized stream
could then be passed to the highlighter.

Another approach is to write your own "Scorer" implementation for use with the highlighter
- something like:

  String wildcardTerms[]={"corporat","firm", "business","compan", "invest"}; //....
      //this impl assumes all search terms are wildcards
  public float getTokenScore(Token token)
  {   
     for(int i=0;i<wildcardTerms;i++)
     {
            if(token.termText().startsWith(wildcardTerms[i]))
                  return 1;
     }
      return 0;
}

Wildcard support could be added as default functionality in the QueryScorer object for un-rewritten
queries 
but I wanted to avoid re-implementing the various multi-term queries' logic in there.

Cheers
Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message