lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Multiple Analyzers
Date Mon, 31 Oct 2005 14:55:04 GMT
Yes, you can certainly do all three in one shot.   Look at the source  
code to StandardAnalyzer, build a custom analyzer that used it's core  
tokenization, then filtered through the SnowballFilter, and then a  
custom soundex filter (or metaphone or similar).  That would make for  
some mighty fuzzy searches, but I guess that is your goal.

A custom analyzer needs to be written, along with the custom soundex  
filter, but should only be a handful of lines of code.  Use the  
source code of Lucene in Action as a basis if you'd like.

     Erik




On 31 Oct 2005, at 09:22, Daniel.Clark@sybase.com wrote:

> Thanks Erik.  So you're saying that my approach won't work, right?  I
> understand your recommendations.  Is there a way that I can just
> incorporate Stem, Soundex, and Standard into one search.  In other  
> words,
> don't toggle anything.  Just index using custom analyzer that contains
> Stem, Soundex, and Standard analyzers at once.  And search using  
> the custom
> analyzer that has Stem, Soundex, and Standard.  I can live with  
> just having
> all on compared to only having one of the three.  Please, advise.   
> Thanks
> again for your swift responses.
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Daniel Clark, Senior Consultant
> Sybase Federal Professional Services
> 6550 Rock Spring Drive, Suite 800
> Bethesda, MD  20817
> Office - (301) 896-1103
> Office Fax - (301) 896-1604
> Mobile - (703) 403-0340
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
>
>              Erik Hatcher
>              <erik@ehatchersol
>               
> utions.com>                                                To
>                                        java-user@lucene.apache.org
>              10/30/2005  
> 07:50                                           cc
>              PM
>                                                                     
> Subject
>                                        Re: Multiple Analyzers
>              Please respond to
>              java-user@lucene.
>                 apache.org
>
>
>
>
>
>
>
>
> On 30 Oct 2005, at 11:03, Daniel.Clark@sybase.com wrote:
>
>> I am implementing a search that allows the user to toggle the
>> Soundex and
>> Stem functionality on and off.  When the Soundex or Sten functions
>> are not
>> on, the search will use the StandardAnalyzer as default.  I
>> implemented the
>> Soundex (SoundexAnalyzer from book Lucene In Action) and Stem
>> (SnowballAnalyzer) analyzer successfully individually.  They only
>> work if I
>> use the analyzers during both indexing and searching.  The problem
>> I have
>> is that I seem to have to use one of the three analyzers or
>> nothing.  I
>> heven't been able to combine them.  The following may help better to
>> understand what I am trying to achieve.  Any help would be greatly
>> appreciated.
>>
>> Indexing
>> ========
>> Index using recursive TokenStream object.
>>
>> public TokenStream tokenStream(String fieldName, Reader reader) {
>>       TokenStream result = new SoundexFilter (
>>                         new SnowballFilter (
>>                            new StopFilter (
>>
>>  new
>> LowerCaseFilter (
>>
>> new StandardFilter (
>> new StandardTokenizer(reader))),
>>
>> StandardAnalyzer.STOP_WORDS)));
>>       return result;
>> }
>>
>
> So you're using a single analyzer for all fields during indexing?
>
>
>> Searching
>> ==========
>> If user selects Soundex function, use SoundexAnalyzer in query.
>> Else if user selects Stem option, use SnowballAnalyzer in query.
>> Else use StandardAnalyzer.
>>
>
> But different analyzers for searching.... you have to be quite
> careful about this sort of thing.  And in this particular case its
> tricky.  You can't stem and soundex during indexing and then
> selectively turn those features off without having indexed the
> various combinations you want available during search.
>
> One way to address this is to index the same text into multiple
> fields with each using a different analysis process.  Then for
> search, line up the query type (stemming, soundex, standard) with the
> proper field.  (side note, this gets even more fun if you're using
> QueryParser and the user enters in "field:term"!).
>
> Another option is to index into multiple indexes - one for each type
> of analysis.  I use this myself for case sensitive and insensitive
> searches.
>
>      Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message