From Trejkaz
Subject Is it possible to rewrite a MultiPhraseQuery to a SpanQuery?
Date Tue, 19 Aug 2014 01:48:58 GMT
Someone asked if it was possible to do a SpanNearQuery between a
TermQuery and a MultiPhraseQuery.

Sadly, you can only use SpanNearQuery with other instances of
SpanQuery, so we have a gigantic method where we rewrite as many
queries as possible to SpanQuery. For instance, TermQuery can
trivially rewrite to SpanTermQuery. PhraseQuery can rewrite to an
ordered SpanNearQuery with the same slop as the original query. So
it's quite possible to do queries like:

    business w/2 "north park"~2

Which gets parsed to:

    within 2
      term "business"
      slop, value 2
        phrase-query "north park"

Which we then rewrite to:

    span-near, unordered, slop 3
      span-term "business"
      span-near, ordered, slop 2      (note: slightly different to
real PhraseQuery semantics, but more explainable)
        span-term "north"
        span-term "park"

However, MultiPhraseQuery is posing a problem. MultiPhraseQuery comes
out of the query parser when certain types of analyser are being used.
For instance, if you parse the query 秋葉原 using the Japanese analyser,
you will get a query tree like this:

    slop, value 0
        term 秋葉原, position increment 0
        term 秋葉, position increment 0
        term 原, position increment 1

Other posts on the mailing list suggest that I can handle the terms in
the same position by creating an unordered  SpanNearQuery with
slop=-1. Then I can wrap these with the term at position increment 1:

    span-near, ordered, slop 0
      span-near, unordered, slop -1
        span-term 秋葉原
        span-term 秋葉
      span-term 原

The problem I can see is, the inner queries could have position
increment > 1 - and the slop on the whole thing could be non-zero as
well. I can't figure out how to express this in span queries.

Is there a way?


