lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jahangir Anwari <jah...@gmail.com>
Subject Re: Extracting span terms using WeightedSpanTermExtractor
Date Fri, 08 Jul 2011 09:43:54 GMT
After applying the patch I was able to get the span positions for all the
terms in the query. But now when I tried to access the positionSpans of each
span term I cannot because they are stored in a package-private PositionSpan
class in WeightedSpanTerm.java which prevents them from being visible
outside the package. I was able to work around it in a by making
PositionSpan utility class public. I moved PositionSpan class it into its
own file in o.a.l.search.highlight package. I don't think this is the best
solution, am open to other alternatives.

This is the content of o.a.l.search.highlight.PositionSpan.java

// Utility class to store a Span
public class PositionSpan {
  int start;
  int end;

  public PositionSpan(int start, int end) {
    this.start = start;
    this.end = end;
  }

  public int getEndPositionSpan() {
     return end;
  }

   public int getStartPositionSpan() {
     return start;
  }
}

-Jahangir

On Fri, Jul 8, 2011 at 2:47 AM, Mark Miller <markrmiller@gmail.com> wrote:

>
> On Jul 7, 2011, at 5:14 PM, Jahangir Anwari wrote:
>
> > I did noticed a strange issue though. When the query is just a
> > PhraseQuery(e.g. "everlasting glory"), getWeightedSpanTerms() returns all
> > the span terms along with their span positions. But when the query is a
> > BooleanQuery containing phrase and non-phrase terms(e.g. "everlasting
> > glory"+unity), getWeightedSpanTerms() returns all the span terms but the
> > span positions are returned only for the phrase terms(i.e. "everlasting"
> and
> > "glory"). Span positions for the non-phrase term(i.e. "unity") is empty.
> Any
> > ideas why this could be happening?
>
>
> Positions are only collected for "position sensitive" queries. The
> Highlighter framework that I plugged this into already runs through the
> TokenStream one token at a time - to highlight a TermQuery, there is no need
> to consult positions - just highlight every occurrence seen while marching
> through the TokenStream. Which means there is no need to find those
> positions either.
>
> If you are looking for those positions, here is a patch to calculate them
> for TermQuerys as well. If you open a JIRA issue, seems like a reasonable
> option to add to the class.
>
> Index:
> lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
> ===================================================================
> ---
> lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
>       (revision 1143407)
> +++
> lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
>       (working copy)
> @@ -133,7 +133,7 @@
>       sp.setBoost(query.getBoost());
>       extractWeightedSpanTerms(terms, sp);
>     } else if (query instanceof TermQuery) {
> -      extractWeightedTerms(terms, query);
> +      extractWeightedSpanTerms(terms, new
> SpanTermQuery(((TermQuery)query).getTerm()));
>     } else if (query instanceof SpanQuery) {
>       extractWeightedSpanTerms(terms, (SpanQuery) query);
>     } else if (query instanceof FilteredQuery) {
>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message