lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Buso <nb...@ebi.ac.uk>
Subject Re: porting a cutsom Analyzer from 3.6 -> 4.0
Date Mon, 10 Dec 2012 10:02:25 GMT
Hi,

take a look at StandardAnalyzer sources for an example:

http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-analyzers-common/4.0.0/org/apache/lucene/analysis/standard/StandardAnalyzer.java#StandardAnalyzer

In your case you are case:
- remember your analyzer have to be reusable!
- WhitespaceTokenizer
- NormalizeCharMap to be used with MappingCharFilter. You can
instantiate a NormalizeCharMap with NormalizeCharMap.Builder. Remember
NormalizeCharMap.Builder is consuming the map at every build() request.
- have a look at MappginCharFilterFactory (I don't really know how this
work :-) )

Cheers,

Nicola

On Sun, 2012-12-09 at 14:15 +0100, Clemens Wyss DEV wrote:
> I have a CustomAnalyzer which overrides "public final TokenStream tokenStream ( String
fieldName, Reader reader )":
> @Override
> public final TokenStream tokenStream ( String fieldName, Reader reader )
> {
> boolean fieldRequiresExactMatching = IndexManager.getInstance().isExactMatchField( fieldName
);
> 
> Reader localreader = reader;
> if ( !fieldRequiresExactMatching )
> {
> 	NormalizeCharMap charMap = new NormalizeCharMap();
> 	charMap.add(",", " ");
> <SNIP>
> 	// wrap/filter reader
> 	localreader = new MappingCharFilter( charMap, reader );			
> }
> TokenStream t = new WhitespaceAnalyzer( IndexManager.CURRENT_LUCENE_VERSION ).tokenStream(
fieldName, localreader );
> 
> if ( !fieldRequiresExactMatching )
> {
> 	// apply stop word filter
> 	Set<String> stopWordSet = null;
> <SNIP>
> 	if ( stopWordSet != null )
> 	{
> 		// wrap/filter stream
> 		StopFilter stopFilter = new StopFilter( IndexManager.CURRENT_LUCENE_VERSION, t, stopWordSet,
true );
> 		t = stopFilter;
> 	}
> }
> return t;
> }
> 
> MappingCharFilter -> whiteSpace analysis - <if condition given> -> stop word
filtering
> 
> As of Lucene 4.0 " protected TokenStreamComponents createComponents ( final String fieldName,
final Reader reader )" is to be overridden and  a TokenStreamComponents has tob e returned.
I don't see how to achieve this ... all I have is a TokenStream but no Tokenizer ...
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message