Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 38451 invoked from network); 5 Feb 2010 22:23:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Feb 2010 22:23:48 -0000 Received: (qmail 89150 invoked by uid 500); 5 Feb 2010 22:23:45 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 89071 invoked by uid 500); 5 Feb 2010 22:23:45 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 89061 invoked by uid 99); 5 Feb 2010 22:23:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Feb 2010 22:23:45 +0000 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [128.230.18.92] (HELO smtp2.syr.edu) (128.230.18.92) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Feb 2010 22:23:36 +0000 Received: from suex07-hub-01.ad.syr.edu (suex07-hub-01.ad.syr.edu [128.230.108.195]) by smtp2.syr.edu (8.14.3/8.14.3) with ESMTP id o15MMwBo001826 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL) for ; Fri, 5 Feb 2010 17:23:15 -0500 Received: from suex07-mbx-03.ad.syr.edu ([128.230.108.133]) by suex07-hub-01.ad.syr.edu ([2002:80e6:6cc3::80e6:6cc3]) with mapi; Fri, 5 Feb 2010 17:23:13 -0500 From: Steven A Rowe To: "java-user@lucene.apache.org" Date: Fri, 5 Feb 2010 17:23:12 -0500 Subject: RE: Match span of capitalized words Thread-Topic: Match span of capitalized words Thread-Index: AcqmdogeNrkdJd+QSEaQbCp1pdNdBwAK2A4g Message-ID: <2D127F11DC79714E9B6A43AC9458147F36661FEE@suex07-mbx-03.ad.syr.edu> References: <3836ec641002031757y6850e4c9vf99c203a230e313@mail.gmail.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.8161:2.4.5,1.2.40,4.0.166 definitions=2010-02-05_15:2010-02-05,2010-02-05,2010-02-05 signatures=0 X-Proofpoint-Spam-Reason: safe Hi Max, On 02/05/2010 at 10:18 AM, Grant Ingersoll wrote: > On Feb 3, 2010, at 8:57 PM, Max Lynch wrote: > > Hi, I would like to do a search for "Microsoft Windows" as a span, but > > not match if words before or after "Microsoft Windows" are upper cased. > >=20 > > For example, I want this to match: another crash for Microsoft Windows > > today But not this: another crash for Microsoft Windows Server today > >=20 > > Is this possible? My first attempt started with the SpanRegexQuery > > from the regex contrib package, but I can't figure out how to put in a > > term I do want to match but don't want to include in the final > > highlighting match. Does that make sense? > >=20 > > My example (using WhitespaceAnalyzer since I care about case): > >=20 > > SpanRegexQuery srq1 =3D new SpanRegexQuery(new Term("contents", "Chase"= )); > > SpanRegexQuery srq2 =3D new SpanRegexQuery(new Term("contents", "Bank[\= \.]*")); > > SpanRegexQuery srq3 =3D new SpanRegexQuery(new Term("contents", "[^A-Z]= *")); >=20 > I'm not sure it supports it, but I wonder if you could use a negative > lookahead assertion? Most regex languages support it. I don't think this would work, since the input to a SpanRegexQuery regex is= a single Term; following Terms are not included in the input. I *think* you can get what you want using SpanNotQuery - something like the= following, using your "Microsoft Windows" example: SpanNot: include: SpanNear(in-order=3Dtrue, slop=3D0): SpanTerm: "Microsoft" SpanTerm: "Windows" exclude: SpanNear(in-order=3Dtrue, slop=3D0): SpanTerm: "Microsoft" SpanTerm: "Windows" SpanRegex: "^\\p{Lu}.*" Steve =20 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org