lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Match span of capitalized words
Date Fri, 05 Feb 2010 15:18:11 GMT

On Feb 3, 2010, at 8:57 PM, Max Lynch wrote:

> Hi,
> I would like to do a search for "Microsoft Windows" as a span, but not match
> if words before or after "Microsoft Windows" are upper cased.
> 
> For example, I want this to match: another crash for Microsoft Windows today
> But not this: another crash for Microsoft Windows Server today
> 
> Is this possible?  My first attempt started with the SpanRegexQuery from the
> regex contrib package, but I can't figure out how to put in a term I do want
> to match but don't want to include in the final highlighting match.  Does
> that make sense?
> 
> My example (using WhitespaceAnalyzer since I care about case):
> 
> SpanRegexQuery srq1 = new SpanRegexQuery( new Term("contents", "Chase"));
> SpanRegexQuery srq2 = new SpanRegexQuery( new Term("contents",
> "Bank[\\.]*"));
> SpanRegexQuery srq3 = new SpanRegexQuery( new Term("contents", "[^A-Z]*"));

I'm not sure it supports it, but I wonder if you could use a negative lookahead assertion?
 Most regex languages support it.

-Grant


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message