lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Max Lynch <ihas...@gmail.com>
Subject Match span of capitalized words
Date Thu, 04 Feb 2010 01:57:33 GMT
Hi,
I would like to do a search for "Microsoft Windows" as a span, but not match
if words before or after "Microsoft Windows" are upper cased.

For example, I want this to match: another crash for Microsoft Windows today
But not this: another crash for Microsoft Windows Server today

Is this possible?  My first attempt started with the SpanRegexQuery from the
regex contrib package, but I can't figure out how to put in a term I do want
to match but don't want to include in the final highlighting match.  Does
that make sense?

My example (using WhitespaceAnalyzer since I care about case):

 SpanRegexQuery srq1 = new SpanRegexQuery( new Term("contents", "Chase"));
 SpanRegexQuery srq2 = new SpanRegexQuery( new Term("contents",
"Bank[\\.]*"));
 SpanRegexQuery srq3 = new SpanRegexQuery( new Term("contents", "[^A-Z]*"));


Thanks,
Max

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message