lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Claudio Corsi" <claudio.co...@gmail.com>
Subject Fwd: SpanNearQuery: how to get the "intra-span" matching positions?
Date Fri, 06 Jun 2008 14:23:15 GMT
Hi,
I'm trying to extend the NearSpansOrdered and NearSpansUnordered classes of
the Lucene core in order to create a way to access to the inner positions of
the current span (in a next() loop). Suppose that the current near span
starts at position N and ends at position N+k, I would discover the
starting/ending positions of all the inner clauses that generate such span.

I'm working on the NearSpansOrdered class right now. I guess that this
modification could be trivial to do, but it requires to me time to
understand the code. Any hints about that?

Actually (as a very inefficient way to proceed) I've added this method to
call *after each next()*, but it doesn't work as aspected:

public Spans[] matchingSpans() {

      ArrayList<Spans> list = new ArrayList<Spans>();
      if (subSpans.length == 0) return null;
      for(int pos = 0; pos < subSpans.length; pos++) {
          if (subSpans[pos].doc() != matchDoc) continue;
          if (subSpans[pos].start() >= matchStart && subSpans[pos].end() <=
matchEnd)
          list.add(subSpans[pos]);
      }
      return list.toArray(new Spans[0]);
}

As you can see, I'm just looping over the subSpans array, filtering the ones
having doc() == matchDoc and which span starts/end inside the current near
span (matchStart and matchEnd are the boundaries returned by start() and
ends() of NearSpansOrdered). This technique doesn't work. Maybe the problem
is that the subSpans are not in the rigth state afte the next() call?

Thank you for any hints!


---------- Forwarded message ----------
From: Paul Elschot <paul.elschot@xs4all.nl>
Date: Fri, May 30, 2008 at 8:51 PM
Subject: Re: SpanNearQuery: how to get the "intra-span" matching positions?
To: java-user@lucene.apache.org


Op Friday 30 May 200812:10 schreef Claudio Corsi:
> Hi all,
> I'm querying my index with a SpanNearQuery built on top of some
> SpanOrQuery. Now, the Spans object I get form the SpanNearQuery
> instance returns me back the sequence of text spans, each defined by
> their starting/ending positions. I'm wondering if there is a simple
> way to get not only the start/end positions of the entire span, but
> the single matching positions inside such span.  For example, suppose
> that a SpanNearQuery composed by 3 SpanTermQuery
> (with a slop of K) produce as resulting span the terms sequence: <t0
> t1 t2 t3 .... t100> (so start() == 0, end() == 100). I know that for
> sure t0 and t100 have generated a match, since the span is "minimal"
> (right?).

Right. But make sure to test, some less than straightforward situations
are possible when matching spans. For example, the subqueries may
be SpanNearQuery's themselves instead of SpanTermQuery's.

> But I also know that there is a 3th match somewhere in the
> span (I have 3 SpanTermQuery that have to match). Is there a way to
> discover it?

To get this information, you'll have to extend NearSpansOrdered and
NearSpansUnordered (package private classes in o.a.l.search.spans)
to also provide for example an int[] with the actual
matching 'positions', or subspans each with their own begin and end.
This is fairly straightforward, but to actually use such positions
SpanScorer will also need to be extended or even replaced.

In case you want to continue this discussion, please do so
on java-dev.

Regards,
Paul Elschot.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




-- 
Claudio Corsi

Mime
View raw message