lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rene Hackl-Sommer <rene.a.ha...@gmx.de>
Subject Re: Increase number of available positions?
Date Wed, 17 Mar 2010 15:17:21 GMT
Hi,

I was looking at SpanNotQuery to see if I could make do without the 
position increment gaps. A search requirement that's causing me some 
trouble to implement is when two terms are supposed to be on the same 
L_2, yet on different L_3's (L_3's are hierarchically below L_2).

With the position increments in place, I can do this:

<SpanNot fieldName="MyField">
<Include>
<SpanNear slop="100000" inOrder="false">
<SpanTerm>t293</SpanTerm>
<SpanTerm>t4979</SpanTerm>
</SpanNear>
</Include>
<Exclude>
<SpanNear slop="1000" inOrder="false">
<SpanTerm>t293</SpanTerm>
<SpanTerm>t4979</SpanTerm>
</SpanNear>
</Exclude>
</SpanNot>

This query returns the expected documents.

I didn't manage to come up with a working solution for the approach 
without posIncGaps. The following, I thought, should work, but for some 
reason it doesn't:

<SpanNot fieldName="MyField">
<Include>
<!-- Gets all the matching spans within L_2 boundaries and includes them -->
<SpanNot>
<Include>
<SpanNear slop="2147483647" inOrder="false" >
<SpanTerm>t293</SpanTerm>
<SpanTerm>t4979</SpanTerm>
</SpanNear>
</Include>
<Exclude>
<SpanTerm>L_2</SpanTerm>
</Exclude>
</SpanNot>
</Include>
<Exclude>
<!-- Gets all the matching spans from L_3 boundaries and excludes them -->
<SpanNot>
<Include>
<SpanNear slop="2147483647" inOrder="false" >
<SpanTerm>t293</SpanTerm>
<SpanTerm>t4979</SpanTerm>
</SpanNear>
</Include>
<Exclude>
<SpanTerm>L_3</SpanTerm>
</Exclude>
</SpanNot>
</Exclude>
</SpanNot>

Shouldn't this query only leave documents, where t293 and t4979 are in 
the same L_2, but not within the same L_3? I fiddled about with 
different queries to no avail and I feel the above is the most 
straightforward try. But the query doesn't match any document at all.

Any ideas on how to improve the second query would be greatly appreciated.

Thanks
Rene

> Hi Rene,
>
> Have you seen SpanNotQuery?:
>
> <http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/spans/SpanNotQuery.html>
>
> For a document that looks like:
>
> <Level_1 id="1">
>    <Level_2 id="1">
>      <Level_3 id="1">T1 T2 T3</Level_3>
>      <Level_3 id="2">T4 T5 T6</Level_3>
>      <Level_3 id="3">T7 T8 T9</Level_3>
>    </Level_2>
>    <Level_2 id="2">
>      <Level_3 id="1">T10 T11 T12</Level_3>
>      <Level_3 id="2">T13 T14 T15</Level_3>
>      <Level_3 id="3">T16 T17 T18</Level_3>
>    </Level_2>
>    ...
> </Level1>
> ...
>
> You could generate the following token stream (L_X being a concrete level boundary token):
>
> L_1 L_2 L_3 T1  T2  T3  L_3 T4  T5  T6  L_3 T7  T8  T9
>      L_2 L_3 T10 T11 T12 L_3 T13 T14 T15 L_3 T16 T17 T18
>      L_2 ...
> ...
>
> A query to find T2 and T8 on the same Level_2 would require you to find a span containing
T2 and T8, but not containing L_2.
>
> This scheme will generalize to as many levels as you need, and you can use nested span
queries to simultaneously provide constraints at multiple levels.  No position increment gap
required.
>
> Caveat: this scheme is not tested - I could be way off base :).
>
> Steve
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>    


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message