lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Braun <mbr...@uni-hd.de>
Subject Search "C++" with Solrs WordDelimiterFilter
Date Fri, 17 Nov 2006 10:04:43 GMT
hi all,

I would like to implement the possibility to search for "C++" and "C#" -
I found in the archive the hint to customize the appropriate *.jj  file
with the code in NutchAnalysis.jj:

     // irregular words
| <#IRREGULAR_WORD: (<C_PLUS_PLUS>|<C_SHARP>)>
| <#C_PLUS_PLUS: ("C"|"c") "++" >
| <#C_SHARP: ("C"|"c") "#" >

I am using a custum analyzer with the yonik's WordDelimiterFilter:

@Override
	public TokenStream tokenStream(String fieldName, Reader reader) {
				
		return new LowerCaseFilter(new WordDelimiterFilter(new
WhitespaceTokenizer(reader),1,1,1,1,1 ));
	}


But as I can see WordDelimiterFilter uses only the WhiteSpaceTokenizer
which does not use a Java-CC file.

What would be the best way to integrate (anyway, preferably not changing
lucene-src) this feature?

Should I override the WhitespaceTokenizer and using java-cc ( are there
any docs on doing this?).

tia,
martin





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message