lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Fwd: SpanNearQuery: how to get the "intra-span" matching positions?
Date Fri, 06 Jun 2008 16:34:09 GMT
See below.

Op Friday 06 June 2008 16:23:15 schreef Claudio Corsi:
> Hi,
> I'm trying to extend the NearSpansOrdered and NearSpansUnordered
> classes of the Lucene core in order to create a way to access to the
> inner positions of the current span (in a next() loop). Suppose that
> the current near span starts at position N and ends at position N+k,
> I would discover the starting/ending positions of all the inner
> clauses that generate such span.
>
> I'm working on the NearSpansOrdered class right now. I guess that
> this modification could be trivial to do, but it requires to me time
> to understand the code. Any hints about that?
>
> Actually (as a very inefficient way to proceed) I've added this
> method to call *after each next()*, but it doesn't work as aspected:
>
> public Spans[] matchingSpans() {
>
>       ArrayList<Spans> list = new ArrayList<Spans>();
>       if (subSpans.length == 0) return null;
>       for(int pos = 0; pos < subSpans.length; pos++) {
>           if (subSpans[pos].doc() != matchDoc) continue;
>           if (subSpans[pos].start() >= matchStart &&
> subSpans[pos].end() <= matchEnd)
>           list.add(subSpans[pos]);
>       }
>       return list.toArray(new Spans[0]);
> }
>
> As you can see, I'm just looping over the subSpans array, filtering
> the ones having doc() == matchDoc and which span starts/end inside
> the current near span (matchStart and matchEnd are the boundaries
> returned by start() and ends() of NearSpansOrdered). This technique
> doesn't work. Maybe the problem is that the subSpans are not in the
> rigth state afte the next() call?

Correct. The reason is that a match must be minimal length,
and for that at least the matching subspans at the lowest
position needs to be advanced beyond its matching position.
This is the same for both the ordered and unordered case.

So, to implement the matchingSpans() method, it will be necessary
to copy the subspans when they are at the matching position.
This will probably involve some fruitless copying for incomplete
matches that never become a real match.

There is also a difference beyond ordered/unordered.
In the ordered case, no overlaps between the matching subspans
are allowed, and in the unordered case overlaps are allowed.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message