lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Polishing up my Lucene integration, customizing analyzer
Date Mon, 16 Nov 2009 04:22:33 GMT
Hi scott, I think only the first two are related to lucene analysis.

You can create an analyzer easily that does what you want, just make it look
like StandardAnalyzer, but instead also add the CommonGramsFilter (this is
in solr) to your tokenstream chain.

On Sun, Nov 15, 2009 at 4:58 PM, Scott Ribe <scott_ribe@killerbytes.com>wrote:

> I bought the original Lucene in Action, read it, set up integration with my
> system--a small Java daemon that monitors db for changes & updates the
> index, and listens for queries and processes them...
>
> Now I'd like to customize query parsing to better fit the particular
> application and users. I'm thinking I need a customized analyzer:
>
> - Handles email addresses, acronyms, etc the way StandardAnalyzer does.
>
> - Turns stop words into Nutch-style bigrams.
>
> - Defaults to "AND" instead of "OR".
>
> - Defaults to in-order phrase queries instead of unordered proximities.
>
> A lot has changed since 2004, as you guys know ;-) So I waded through
> release notes & docs and found many of the differences that mattered for my
> use and got it working with 2.9.0. But I'm a bit lost as to how to get that
> combination of features in an analyzer--obviously a couple of them are
> simple settings to StandardAnalyzer, but not all, particularly those first
> two items...
>
> Any hints or directions appreciated.
>
> --
> Scott Ribe
> scott_ribe@killerbytes.com
> http://www.killerbytes.com/
> (303) 722-0567 voice
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message