Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 810BA40DD for ; Thu, 7 Jul 2011 23:48:05 +0000 (UTC) Received: (qmail 61840 invoked by uid 500); 7 Jul 2011 23:48:03 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 61740 invoked by uid 500); 7 Jul 2011 23:48:02 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 61731 invoked by uid 99); 7 Jul 2011 23:48:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jul 2011 23:48:02 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of markrmiller@gmail.com designates 209.85.212.48 as permitted sender) Received: from [209.85.212.48] (HELO mail-vw0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jul 2011 23:47:54 +0000 Received: by vws7 with SMTP id 7so1754716vws.35 for ; Thu, 07 Jul 2011 16:47:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=LrBC3Aq0h948MfAFGRiu/7zUN/ypxIpTeBGXWcqInf4=; b=MwbE1bGuXRIOMCPGO5ku/uC/YDL61gTJx5QdgNg7rP41zKakr2KRManN3QFk/9HqNu tgleHCfMKYg5uRGeeLRTOdJg/IB+C/x/EEbOpHRrldXfgkLEO8NtLtZF8BFnwaOPo4on vJnVHrAkZhQjJzyxnDeuGVxQTS+BZvb+6ROE0= Received: by 10.52.111.5 with SMTP id ie5mr1873147vdb.213.1310082453010; Thu, 07 Jul 2011 16:47:33 -0700 (PDT) Received: from [192.168.1.201] (ool-44c78059.dyn.optonline.net [68.199.128.89]) by mx.google.com with ESMTPS id du9sm5802394vbb.10.2011.07.07.16.47.31 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 07 Jul 2011 16:47:32 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: Extracting span terms using WeightedSpanTermExtractor From: Mark Miller In-Reply-To: Date: Thu, 7 Jul 2011 19:47:30 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <8DF98893-A7F7-4EE6-9357-8CDA0AD3C15B@gmail.com> References: To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org On Jul 7, 2011, at 5:14 PM, Jahangir Anwari wrote: > I did noticed a strange issue though. When the query is just a > PhraseQuery(e.g. "everlasting glory"), getWeightedSpanTerms() returns = all > the span terms along with their span positions. But when the query is = a > BooleanQuery containing phrase and non-phrase terms(e.g. "everlasting > glory"+unity), getWeightedSpanTerms() returns all the span terms but = the > span positions are returned only for the phrase terms(i.e. = "everlasting" and > "glory"). Span positions for the non-phrase term(i.e. "unity") is = empty. Any > ideas why this could be happening? Positions are only collected for "position sensitive" queries. The = Highlighter framework that I plugged this into already runs through the = TokenStream one token at a time - to highlight a TermQuery, there is no = need to consult positions - just highlight every occurrence seen while = marching through the TokenStream. Which means there is no need to find = those positions either. If you are looking for those positions, here is a patch to calculate = them for TermQuerys as well. If you open a JIRA issue, seems like a = reasonable option to add to the class. Index: = lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/Wei= ghtedSpanTermExtractor.java =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- = lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/Wei= ghtedSpanTermExtractor.java (revision 1143407) +++ = lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/Wei= ghtedSpanTermExtractor.java (working copy) @@ -133,7 +133,7 @@ sp.setBoost(query.getBoost()); extractWeightedSpanTerms(terms, sp); } else if (query instanceof TermQuery) { - extractWeightedTerms(terms, query); + extractWeightedSpanTerms(terms, new = SpanTermQuery(((TermQuery)query).getTerm())); } else if (query instanceof SpanQuery) { extractWeightedSpanTerms(terms, (SpanQuery) query); } else if (query instanceof FilteredQuery) { - Mark Miller lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org