lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Hill <solr-l...@worldware.com>
Subject Re: SpanNearQuery - inOrder parameter
Date Tue, 10 May 2011 20:00:34 GMT
Since no one else is jumping in, I'll say that I suspect that the span
query code does not bother to check to see if two of the terms are the
same.

I think that would account for the behavior you are seeing. Since the
second SpanTermQuery would match the same term the first one did.

Note that I'm not familiar with the span query code, so this is just a
quick deduction.

Not sure how easy it would be to add this duplicate term detection, if
that's the problem.

Tom


On Tue, May 10, 2011 at 5:58 AM, Gregory Tarr <Gregory.tarr@detica.com> wrote:
> Anyone able to help me with the problem below?
>
> Thanks
>
> Greg
>
> -----Original Message-----
> From: Gregory Tarr [mailto:Gregory.tarr@detica.com]
> Sent: 09 May 2011 12:33
> To: java-user@lucene.apache.org
> Subject: RE: SpanNearQuery - inOrder parameter
>
> Attachment didn't work - test below:
>
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field; import
> org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.TopDocsCollector;
> import org.apache.lucene.search.TopScoreDocCollector;
> import org.apache.lucene.search.spans.SpanNearQuery;
> import org.apache.lucene.search.spans.SpanQuery;
> import org.apache.lucene.search.spans.SpanTermQuery;
> import org.apache.lucene.store.RAMDirectory;
> import org.apache.lucene.util.Version;
> import org.junit.Assert;
> import org.junit.Test;
>
> public class TestSpanNearQueryInOrder {
>
> @Test
> public void testSpanNearQueryInOrder() {  RAMDirectory directory = new
> RAMDirectory();  IndexWriter writer = new IndexWriter(directory, new
> StandardAnalyzer(Version.LUCENE_29), true,
> IndexWriter.MaxFieldLength.UNLIMITED);
>  TopDocsCollector collector = TopScoreDocCollector.create(3, false);
>
>  Document doc = new Document();
>
>  // DOC1
>  doc.add(new Field("text","dddd aaaa bbbb cccc", Field.Store.YES,
> Field.Index.ANALYZED));
>
>  writer.addDocument(doc);
>  doc = new Document();
>
>  // DOC2
>  doc.add(new Field("text","dddd aaaa aaaa cccc"));
>
>  writer.addDocument(doc);
>  doc = new Document();
>
>  // DOC3
>  doc.add(new Field("text","dddd aaaa yyyy aaaa xxxx cccc"));
>
>  writer.addDocument(doc);
>  writer.optimize();
>  writer.close();
>
>  searcher = new IndexSearcher(directory, false);
>
>  SpanQuery[] clauses = new SpanQuery[2];  clauses[0] = new
> SpanTermQuery(new Term("text", "aaaa"));  clauses[1] = new
> SpanTermQuery(new Term("text", "aaaa"));
>
>  // Don't care about order, so setting inOrder = false  SpanNearQuery q
> = new SpanNearQuery(clauses, 1, false);  searcher.search(q, collector);
>
>  // This assert fails - 3 docs are returned. Expecting only DOC2 and
> DOC3
>  Assert.assertEquals("Check 2 results", 2, collector.getTotalHits());
>
>  collector = new TopScoreDocCollector.create(3, false);  clauses = new
> SpanQuery[2];  clauses[0] = new SpanTermQuery(new Term("text", "aaaa"));
> clauses[1] = new SpanTermQuery(new Term("text", "aaaa"));
>
>  // Don't care about order, so setting inOrder = false  q = new
> SpanNearQuery(clauses, 0, false);  searcher.search(q, collector);
>
>  // This assert fails - 3 docs are returned. Expecting only DOC2
> Assert.assertEquals("Check 1 result", 1, collector.getTotalHits()); }
>
> }
>
> ________________________________
>
> From: Gregory Tarr [mailto:Gregory.tarr@detica.com]
> Sent: 09 May 2011 12:29
> To: java-user@lucene.apache.org
> Subject: SpanNearQuery - inOrder parameter
>
>
>
> I attach a junit test which shows strange behaviour of the inOrder
> parameter on the SpanNearQuery constructor, using Lucene 2.9.4.
>
> My understanding of this parameter is that true forces the order and
> false doesn't care about the order.
>
> Using true always works. However using false works fine when the terms
> in the query are distinct, but if they are equivalent, e.g. searching
> for "john john", I do not get the expected results. The workaround seems
> to be to always use true for queries with repeated terms.
>
> Any help?
>
> Thanks
>
> Greg
>
> <<TestSpanNearQueryInOrder.java>>
>
>
> Please consider the environment before printing this email.
>
> This message should be regarded as confidential. If you have received
> this email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard
> copy by an authorised signatory.  The contents of this email may relate
> to dealings with other companies within the Detica Limited group of
> companies.
>
> Detica Limited is registered in England under No: 1337451.
>
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
> England.
>
>
> Please consider the environment before printing this email.
>
> This message should be regarded as confidential. If you have received
> this email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard
> copy by an authorised signatory.  The contents of this email may relate
> to dealings with other companies within the Detica Limited group of
> companies.
>
> Detica Limited is registered in England under No: 1337451.
>
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP,
> England.
>
> Please consider the environment before printing this email.
>
> This message should be regarded as confidential. If you have received this email in error
please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy by an authorised
signatory.  The contents of this email may relate to dealings with other companies within
the Detica Limited group of companies.
>
> Detica Limited is registered in England under No: 1337451.
>
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message