lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Allan Hill <p...@metajure.com>
Subject Phrase Queries vs. SpanTermQueries exact phrases vs. stop words
Date Tue, 31 Jan 2012 20:48:02 GMT
In Lucene, 3.4 I recently implemented "Translating PhraseQuery to SpanNearQuery" (see Lucene
in Action, page 220) because I wanted _order_ to matter.

Here is my exact code called from getFieldsQuery once I know I'm looking at a PhraseQuery,
but I think it is exactly from the book.

    static Query buildSpanNearQuery(PhraseQuery phraseQ, int slop) {
        Term[] terms = phraseQ.getTerms();
        SpanTermQuery[] clauses = new SpanTermQuery[terms.length];
        for (int i = 0; i < terms.length; i++) {
            clauses[i] = new SpanTermQuery(terms[i]);
        }
        SpanNearQuery query = new SpanNearQuery(clauses, slop, PHRASE_ORDER_MATTERS);
        return query;
    }

I put in my own QueryParser and things looked good until I try a phrase with stop words.
Using the old PhraseQuery I got results on a phrase with stop words without extending the
slop, but with SpanNearQuery unless the query includes some slop, nothing is found.
This conflicts with the typical use case of a user taking a phrase, pasting into the search
bar with quotes and expecting to find his document.
I can't just add some more slop, because it depends on how many stop words are in any sequence
in the phrase.

Any suggestions on how to solve the problem of combining the idea of SpanNear (so that words
in order in a phrase is better) with text that has stop words removed, so that I can to support
the simple use of quotes for exact quoted text matching?

Any Ideas?

-Paul


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message