lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Shalyminov <>
Subject Question on payload matching query
Date Mon, 03 Jun 2013 17:14:30 GMT

I've implemented a SpanQuery class that acts like SpanPositionCheckQuery but also matches
For example, here is the "gram" field in a single indexed document:

"gram": N|1|1    sg|1|0    A|2|0    pl|2|0    A|3|0    sg|3|0

Every token's meaning is as follows:
N - grammatical annotation | 1 - parse number (payload) | 1 - position increment

So, the document has a single word position which has 3 ambiguous parses, #1 and #2, and #3.
Each parse has 2 annotations, "N, sg", "A, pl", and "A, sg".
And my SpanQuery is supposed not to match annotations from different parses, e.g. "sg &
pl" should not be matched, but "N & sg" should be.

The logic is:

  protected AcceptStatus acceptPosition(Spans spans) throws IOException {
    boolean result = spans.isPayloadAvailable();
    if (result == true) {
      Collection<byte[]> payloads = spans.getPayload();
      int first_payload = PayloadHelper.decodeInt(payloads.iterator().next(), 0);
      for (byte[] payload: payloads) {
        int decoded_payload = PayloadHelper.decodeInt(payload, 0);
        if(decoded_payload != first_payload) {
          return AcceptStatus.NO;
    return AcceptStatus.YES;

Then, for the query "sg & pl", which is a wrapped unordered SpanNearQuery: ParseMatchingSpanQuery(SpanNearQuery("gram:sg",
"gram:pl", false, -1)) - acceptPosition is called the first time with payloads array containing
['1', '2'], and second time - with just a ['3']. The second match actually matches, and it's
totally unintuitive to me.
To my understanding, it should be called with pairs of spans, ideally ['1', '2'], ['1', '3'].
Why does it not?:)
Could you please explain to me the logic of matching with payload checking?
Best Begards,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message