lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Shane <sha...@lexum.com>
Subject Adding Support for occurances in SpanQueries
Date Wed, 09 Mar 2011 21:52:16 GMT
I'm currently working on a project that involves highlighting all the words in document that
match a given Query. 

Right now, there is a highlighter in Lucene, but all it does, I think, is to take the query,
extract the terms out of it, and highlight every term.

I presume this is what everyone wants usually, but in my case, what I want is to match every
word that is actually part of the queries internal evaluation.

For example, Lets say I used a SpanNearNot query, I would not want to highlight the terms
in the spans that were excluded. 

I was thinking of adding this feature to the SpanQueries, since they share an API that regular
Queries do not have: getSpans().

Regular queries, I think, do not allow us to get the positions of the matched elements in
the query (if any matched) so I would not touch these.

Considering SpanQueries have the getSpans() method, I wanted to add this API to it :

*****************************

public abstract class Spans {
  public abstract boolean next() throws IOException;
  public abstract boolean skipTo(int target) throws IOException;
  public abstract int doc();
  public abstract int start();
  public abstract int end();
  
  public abstract Collection/*<byte[]>*/ getPayload() throws IOException;
  public abstract boolean isPayloadAvailable();

  //NEW STUFF HERE
  public abstract Collection/*SpanMatchedTerm*/ getSpanMatchedTerms();
}

public class SpanMatchedTerm {
    public Term term;
    public String displayName;
    public int position;
    
    /**
     * Creates a MatchedTerm. The displayName is an optional name that
     * refers to this query. Used when term.getTerm() is not enough.
     * A good example would be when you stem terms.
     * You could use the displayName as the non-stemmed text, which
     * you would use afterwards to display this match.
    **/
    public SpanMatchedTerm(Term term, String displayName, int position) {
        this.term = term;
        this.position = position;
        this.displayName = displayName;
    }        
}

******************************

So basically, I can create a SpanQuery, then call getSpans() on it, cycle through the spans,
each time calling getSpanMatchedTerms() to get the individual terms that allowed this span
to match. 

The getSpanMatchedTerms would work just like the getPayloads, except it will return the positions
of the match along with whatever optional displayName you tagged along for this term.

The displayName is useful if you want to write a SpanWildcardQuery() that mimics the WildcardQuery.
In that case, you would like to highlight every term, but if you want to show a navigation
bar to cycle through hits, you want to show the original term with the wildcard in it, not
every different term that matched.

Do you think its the good way of going about this problem?

Would it stand a chance of getting included if this implementation was submited as a patch
along with the fixes to the various Spans*** classes to make it work?

Thanks!
Daniel Shane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message