Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 15169 invoked from network); 4 Mar 2010 22:15:38 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Mar 2010 22:15:38 -0000 Received: (qmail 80391 invoked by uid 500); 4 Mar 2010 22:15:26 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 80339 invoked by uid 500); 4 Mar 2010 22:15:26 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 80332 invoked by uid 99); 4 Mar 2010 22:15:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Mar 2010 22:15:26 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [194.109.24.22] (HELO smtp-vbr2.xs4all.nl) (194.109.24.22) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Mar 2010 22:15:18 +0000 Received: from k8u.localnet (porta.xs4all.nl [83.163.165.214]) by smtp-vbr2.xs4all.nl (8.13.8/8.13.8) with ESMTP id o24MEu32044020 for ; Thu, 4 Mar 2010 23:14:56 +0100 (CET) (envelope-from paul.elschot@xs4all.nl) From: Paul Elschot To: java-dev@lucene.apache.org Subject: Re: Request for clarification on unordered SpanNearQuery Date: Thu, 4 Mar 2010 23:14:59 +0100 User-Agent: KMail/1.12.2 (Linux/2.6.31-19-generic; KDE/4.3.2; i686; ; ) References: <3B01AF82880E6947A069AA17FF5CFE1503D269E3@0015-its-exmb04.us.saic.com> <201003041751.32776.paul.elschot@xs4all.nl> <3B01AF82880E6947A069AA17FF5CFE1503D269EA@0015-its-exmb04.us.saic.com> In-Reply-To: <3B01AF82880E6947A069AA17FF5CFE1503D269EA@0015-its-exmb04.us.saic.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201003042314.59421.paul.elschot@xs4all.nl> X-Virus-Scanned: by XS4ALL Virus Scanner All possible unordered matches with a given slop can be so many that enumerating them all will be so slow that it is impractical for general use. Note that I have not been very precise: one should also consider the same term indexed in the same position multiple times (not normal, but not impossible) and, last but not least, nested SpanNearQueries. As Mark said, spans are funny beasts. Before starting these 40 hours, you could try and discuss design ideas here. Could you elaborate on what you need to achieve? Regards, Paul Elschot Op donderdag 04 maart 2010 21:03:09 schreef Goddard, Michael J.: > Paul (and Mark), > > Thank you for answering. Do you suppose "not really straightforward" means "40 hours" or something like that? I'm just trying to get an idea of whether what I'm attempting is worth the effort. > > Mike > > > -----Original Message----- > From: java-dev-return-47351-MICHAEL.J.GODDARD=saic.com@lucene.apache.org on behalf of Paul Elschot > Sent: Thu 3/4/2010 11:51 AM > To: java-dev@lucene.apache.org > Subject: Re: Request for clarification on unordered SpanNearQuery > > Michael, > > The test for the 4th range fails because the first matching subspans > (for t1 in this case) is always the one that is first advanced, and the first > match at that point has a less slop (0) than the maximum allowed (1) > so one might actually try and advance another subspans first. > But that is not really straightforward to implement, especially when different > terms can be indexed in the same position. > > Perhaps the javadocs for the unordered case should be improved to mention > that in the unordered case the first subspans is always the one that is > advanced first. > > Regards, > Paul Elschot > > Op donderdag 04 maart 2010 17:34:26 schreef Goddard, Michael J.: > > I've been working on some highlighting changes involving Spans (https://issues.apache.org/jira/browse/LUCENE-2287) and could use some help understanding when overlapping Spans are valid. To illustrate, I added the test below to the TestSpans class; this test fails because there is no fourth range. > > > > Am I wrong in my expectation that that last range would match? > > > > Thanks. > > > > Mike > > > > > > // Doc 11 contains "t1 t2 t1 t3 t2 t3" > > public void testSpanNearUnOrderedOverlap() throws Exception { > > boolean ordered = false; > > int slop = 1; > > SpanNearQuery snq = new SpanNearQuery( > > new SpanQuery[] { > > makeSpanTermQuery("t1"), > > makeSpanTermQuery("t2"), > > makeSpanTermQuery("t3") }, > > slop, > > ordered); > > Spans spans = snq.getSpans(searcher.getIndexReader()); > > > > assertTrue("first range", spans.next()); > > assertEquals("first doc", 11, spans.doc()); > > assertEquals("first start", 0, spans.start()); > > assertEquals("first end", 4, spans.end()); > > > > assertTrue("second range", spans.next()); > > assertEquals("second doc", 11, spans.doc()); > > assertEquals("second start", 1, spans.start()); > > assertEquals("second end", 4, spans.end()); > > > > assertTrue("third range", spans.next()); > > assertEquals("third doc", 11, spans.doc()); > > assertEquals("third start", 2, spans.start()); > > assertEquals("third end", 5, spans.end()); > > > > // Question: why wouldn't this Span be found? > > assertTrue("fourth range", spans.next()); > > assertEquals("fourth doc", 11, spans.doc()); > > assertEquals("fourth start", 2, spans.start()); > > assertEquals("fourth end", 6, spans.end()); > > > > assertFalse("fifth range", spans.next()); > > } > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org