From java-user-return-15255-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Fri Jul 01 10:15:47 2005 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 870 invoked from network); 1 Jul 2005 10:15:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 1 Jul 2005 10:15:47 -0000 Received: (qmail 63706 invoked by uid 500); 1 Jul 2005 10:15:38 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 63679 invoked by uid 500); 1 Jul 2005 10:15:37 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 63659 invoked by uid 99); 1 Jul 2005 10:15:37 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jul 2005 03:15:37 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [69.55.225.129] (HELO ehatchersolutions.com) (69.55.225.129) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jul 2005 03:15:39 -0700 Received: by ehatchersolutions.com (Postfix, from userid 504) id C3F4613E2006; Fri, 1 Jul 2005 06:15:30 -0400 (EDT) Received: from [172.16.1.101] (va-71-48-129-227.dhcp.sprint-hsd.net [71.48.129.227]) by ehatchersolutions.com (Postfix) with ESMTP id 1846413E2005 for ; Fri, 1 Jul 2005 06:15:25 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v730) In-Reply-To: <42C45781.2080606@yahoo.co.uk> References: <4F8DDDFDAC9A864AAED5BB875129DF4B04D40BD2@tmskoex01.tm.thomsonmedia.com> <42C45781.2080606@yahoo.co.uk> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <7E0D0319-0A42-4532-AF66-636049500D3A@ehatchersolutions.com> Content-Transfer-Encoding: 7bit From: Erik Hatcher Subject: Re: Does highlighter highlight phrases only? Date: Fri, 1 Jul 2005 06:15:26 -0400 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.730) X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on javelina X-Spam-Level: X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-5.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.0.1 X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On Jun 30, 2005, at 4:35 PM, markharw00d wrote: > Hi Erik, > Yes I was thinking that code could form the basis of a new > highlighter. > > I've just attached a QuerySpansExtractor to the bugzilla entry for > the new highlighter. This class produces Spans from queries other > than SpanXxxxQueries eg phrase, term and booleans. > I'm thinking you can throw the text to be highligted as a single > doc into a MemIndex , extracts the spans using the > QuerySpansExtractor and the MemIndex's reader (need to expose a > getReader method on this - I'm working on it), then use some new > highlighting logic on the Spans. > > Sound reasonable? I think so. One minor issue... a SpanNearQuery is not entirely equal to a PhraseQuery when there is slop involved. You have this: SpanNearQuery sp = new SpanNearQuery(clauses,query.getSlop (),false); Here's a test from Lucene in Action that demonstrates: public void testSpanNearQuery() throws Exception { SpanQuery[] quick_brown_dog = new SpanQuery[]{quick, brown, dog}; SpanNearQuery snq = new SpanNearQuery(quick_brown_dog, 0, true); assertNoMatches(snq); snq = new SpanNearQuery(quick_brown_dog, 4, true); assertNoMatches(snq); snq = new SpanNearQuery(quick_brown_dog, 5, true); assertOnlyBrownFox(snq); // interesting - even a sloppy phrase query would require // more slop to match snq = new SpanNearQuery(new SpanQuery[]{lazy, fox}, 3, false); assertOnlyBrownFox(snq); PhraseQuery pq = new PhraseQuery(); pq.add(new Term("f", "lazy")); pq.add(new Term("f", "fox")); pq.setSlop(4); assertNoMatches(pq); pq.setSlop(5); assertOnlyBrownFox(pq); } So to be entirely accurate, an offset will be needed to get SpanNearQuery to match PhraseQuery, though I have a feeling (I'm not thinking through the details at the moment) that there is an edge case or two that is not compatible. A PhraseQuery with slop of 1, for example - can a SpanNearQuery be set up to match that exactly? I don't think so... a PhraseQuery with slop of 1 cannot match in reverse order, only in order with an optional hole between terms. But, I like the idea of highlighting spans by converting other query types to get the Spans. Erik --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org