lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From viruslviv <virusl...@gmail.com>
Subject SpanNearQuery doesn't return document if the same word within query is repeated
Date Thu, 30 Dec 2010 13:58:06 GMT

Hello Lucene community!

I am working with Solr/Lucene tool near half of year, and faced with
interesting issue with SpanNearQuery queries. 


Consider we have following text within document (you can find whole document
text below):
"intended recipient of this message or if this message has been addressed"

and query:
(messag within 3 of address) within 5 of messag within 3 of address. 

I was expecting that this query will return me the document, however it
didn't.

However, according to 
http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ What does it
mean to require that Spans come in order and how do SpanQuerys actually
match?  it looks like that Lucene doesn't see word "message" second time,
but picks up the first one.
I tried to change slop for the last word to 6 ((messag within 3 of address)
within 5 of messag within 6 of address) and the document was returned.
Unfortunately I am not allowed to do widening for queries in runtime. 

Did anyone has such issue and can provide me some information how to omit
this?

Text sample within document (Please note that Snowball analyzer is used
during indexing, this is raw text):

The contents of this e-mail message and
any attachments are intended solely for the
addressee(s) and may contain confidential
and/or legally privileged information. If you
are not the intended recipient of this message
or if this message has been addressed to you
in error, please immediately alert the sender
 by reply e-mail and then delete this message
and any attachments. If you are not the
intended recipient, you are notified that
any use, dissemination, distribution, copying,
or storage of this message or any attachment
is strictly prohibited.


Code:

public static void main(String ... args) throws Exception,
            CorruptIndexException, IOException {
        SpanNearQuery spanNear = new SpanNearQuery(new SpanQuery[] {
                new SpanTermQuery(new Term(BODY, "intend")),
                new SpanTermQuery(new Term(BODY, "messag"))},
                4,
                false);
        SpanNearQuery spanNear2 = new SpanNearQuery(new SpanQuery[]
{spanNear, new SpanTermQuery(new Term(BODY, "messag"))}, 5, false);
        SpanNearQuery spanNear3 = new SpanNearQuery(new SpanQuery[]
{spanNear2, new SpanTermQuery(new Term(BODY, "address"))}, 3, false);
        Directory directory = SimpleFSDirectory("C:\\\\20\\index");
        IndexSearcher searcher = new IndexSearcher(directory);
        searcher.setDefaultFieldSortScoring(true, false);
        TopDocs results = searcher.search(spanNear3, null, 20,
Sort.RELEVANCE);
        //Iterator it = results.iterator();

        for (ScoreDoc sd : results.scoreDocs) {
            int docID = sd.doc;
            float score = sd.score;
            System.out.println("Doc id: " + docID + " ,score: " + score);
        }
        searcher.close();
    }

-- 
View this message in context: http://lucene.472066.n3.nabble.com/SpanNearQuery-doesn-t-return-document-if-the-same-word-within-query-is-repeated-tp2167618p2167618.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message