lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Shalyminov <ishalymi...@yandex-team.ru>
Subject Re: Question on payload matching query
Date Wed, 05 Jun 2013 13:46:51 GMT
Hi all!

Just before diving in the core Lucene code, I would like to ask once again if there are detailed
tutorials on SpanQuery execution algorithm, with postings retrieval and positional data matching.

Best Regards,
Igor

03.06.13, 21:15, "Igor Shalyminov" <ishalyminov@yandex-team.ru>":
> 
> Hello!
> 
> I've implemented a SpanQuery class that acts like SpanPositionCheckQuery but also matches
payloads.
> For example, here is the "gram" field in a single indexed document:
> 
> "gram": N|1|1    sg|1|0    A|2|0    pl|2|0    A|3|0    sg|3|0
> 
> Every token's meaning is as follows:
> N - grammatical annotation | 1 - parse number (payload) | 1 - position increment
> 
> So, the document has a single word position which has 3 ambiguous parses, #1 and #2,
and #3. Each parse has 2 annotations, "N, sg", "A, pl", and "A, sg".
> And my SpanQuery is supposed not to match annotations from different parses, e.g. "sg
& pl" should not be matched, but "N & sg" should be.
> 
> The logic is:
> 
>   @Override
>   protected AcceptStatus acceptPosition(Spans spans) throws IOException {
>     boolean result = spans.isPayloadAvailable();
>     if (result == true) {
>       Collection<byte[]> payloads = spans.getPayload();
>       int first_payload = PayloadHelper.decodeInt(payloads.iterator().next(), 0);
>       for (byte[] payload: payloads) {
>         int decoded_payload = PayloadHelper.decodeInt(payload, 0);
>         if(decoded_payload != first_payload) {
>           return AcceptStatus.NO;
>         }
>       }
>     }
>     return AcceptStatus.YES;
>   }
> 
> Then, for the query "sg & pl", which is a wrapped unordered SpanNearQuery: ParseMatchingSpanQuery(SpanNearQuery("gram:sg",
"gram:pl", false, -1)) - acceptPosition is called the first time with payloads array containing
['1', '2'], and second time - with just a ['3']. The second match actually matches, and it's
totally unintuitive to me.
> To my understanding, it should be called with pairs of spans, ideally ['1', '2'], ['1',
'3']. Why does it not?:)
> Could you please explain to me the logic of matching with payload checking?
>  
> -- 
> Best Begards,
> Igor
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message