lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Polishing up my Lucene integration, customizing analyzer
Date Sun, 15 Nov 2009 22:39:12 GMT
I'm missing something here. The first two points seem incompatible.

A single analyzer that works like StandardAnalyzer in any way won't
product bigrams of any sort. It seems to me like you'd have to copy the
input into two fields analyzed two different ways.

Or do you ONLY want bigrams on the stop words?

could you expand a bit more, perhaps with an example or two?

The third point is easily doable with a method call on QueryParser

For the fourth, either manually construct the appropriate Span* query
or use the surround parser in contrib.


On Sun, Nov 15, 2009 at 4:58 PM, Scott Ribe <>wrote:

> I bought the original Lucene in Action, read it, set up integration with my
> system--a small Java daemon that monitors db for changes & updates the
> index, and listens for queries and processes them...
> Now I'd like to customize query parsing to better fit the particular
> application and users. I'm thinking I need a customized analyzer:
> - Handles email addresses, acronyms, etc the way StandardAnalyzer does.
> - Turns stop words into Nutch-style bigrams.
> - Defaults to "AND" instead of "OR".
> - Defaults to in-order phrase queries instead of unordered proximities.
> A lot has changed since 2004, as you guys know ;-) So I waded through
> release notes & docs and found many of the differences that mattered for my
> use and got it working with 2.9.0. But I'm a bit lost as to how to get that
> combination of features in an analyzer--obviously a couple of them are
> simple settings to StandardAnalyzer, but not all, particularly those first
> two items...
> Any hints or directions appreciated.
> --
> Scott Ribe
> (303) 722-0567 voice
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message