lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gregory Dearing <gregdear...@gmail.com>
Subject Re: SpanNearQuery -- bug or feature?
Date Mon, 13 Jan 2014 15:47:26 GMT
Piotr,

The 'unordered' flag allows spans to be overlapping and still be a match.
I believe this is a feature.

It may seem unusual for a term to be 'near' itself, but it may be more
intuitive if you consider spans that are more than one term long.

spanNear(
    [spanNear([contents:test, contents:bunga], 0, true),
     spanNear([contents:bunga, contents:test], 0, true)],
    10, false
)

This is searching for two phrases, as long as they're reasonably 'close'.
It should match your first example document even though the sub-spans
overlap on the term 'bunga'.

Also, Mark Miller wrote a really nice article on span mechanics that may be
helpful: http://searchhub.org/2009/07/18/the-spanquery/

-Greg


On Fri, Jan 10, 2014 at 7:01 PM, Piotr Pęzik <piotr.pezik@gmail.com> wrote:

> Hi,
>
> could anyone please tell me if the following behavior is expected in
> Lucene 4.5?
>
> Let's assume we have an index with two documents:
>
> 1. contents: "test bunga bunga test"
> 2. contents: "test bunga test"
>
> We run two SpanNearQueries against this index:
>
> 1. spanNear([contents:bunga, contents:bunga], 0, true)
> 2. spanNear([contents:bunga, contents:bunga], 0, false)
>
> For the first query we get 1 hit. The first document in the example above
> gets matched and the second one doesn't. This make sense, because we want a
>  the term "bunga" followed by another "bunga" here.
>
> For the second query both documents get matched. Why does the second
> document with a single occurrence of 'bunga' get matched?
>
> A complete example follows.
>
> Thanks in advance!
>
>
>
> Piotr
>
>
> -----------
>
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.TextField;
> import org.apache.lucene.index.DirectoryReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.TopDocs;
> import org.apache.lucene.search.spans.SpanNearQuery;
> import org.apache.lucene.search.spans.SpanQuery;
> import org.apache.lucene.search.spans.SpanTermQuery;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.FSDirectory;
> import org.apache.lucene.store.RAMDirectory;
> import org.apache.lucene.util.Version;
> import java.io.StringReader;
> import static org.junit.Assert.assertEquals;
>
> class SpansBug {
>
>     public static void main(String [] args) throws Exception {
>
>         Directory dir = new RAMDirectory();
>         Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_45);
>         IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_45,
> analyzer);
>
>         IndexWriter writer = new IndexWriter(dir, iwc);
>         String contents = "contents";
>         Document doc1 = new Document();
>         doc1.add(new TextField(contents, new StringReader("test bunga
> bunga test")));
>         Document doc2 = new Document();
>         doc2.add(new TextField(contents, new StringReader("test bunga
> test")));
>
>         writer.addDocument(doc1);
>         writer.addDocument(doc2);
>
>         writer.commit();
>
>         IndexSearcher searcher = new IndexSearcher(DirectoryReader.
> open(dir));
>
>         SpanQuery stq1 = new SpanTermQuery(new Term(contents,"bunga"));
>         SpanQuery stq2 = new SpanTermQuery(new Term(contents,"bunga"));
>         SpanQuery [] spqa = new SpanQuery[]{stq1,stq2};
>
>         SpanNearQuery spanQ1 = new SpanNearQuery(spqa,0, true);
>         SpanNearQuery spanQ2 = new SpanNearQuery(spqa,0, false);
>
>         System.out.println(spanQ1);
>
>         TopDocs tdocs1 = searcher.search(spanQ1,10);
>         assertEquals(tdocs1.totalHits ,1);
>
>         System.out.println(spanQ2);
>
>         TopDocs tdocs2 = searcher.search(spanQ2,10);
>         //Why does the following assertion fail?
>         assertEquals(tdocs2.totalHits ,1);
>
>
>     }
> }
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message