lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Braun <>
Subject Search "C++" with Solrs WordDelimiterFilter
Date Fri, 17 Nov 2006 10:04:43 GMT
hi all,

I would like to implement the possibility to search for "C++" and "C#" -
I found in the archive the hint to customize the appropriate *.jj  file
with the code in NutchAnalysis.jj:

     // irregular words
| <#C_PLUS_PLUS: ("C"|"c") "++" >
| <#C_SHARP: ("C"|"c") "#" >

I am using a custum analyzer with the yonik's WordDelimiterFilter:

	public TokenStream tokenStream(String fieldName, Reader reader) {
		return new LowerCaseFilter(new WordDelimiterFilter(new
WhitespaceTokenizer(reader),1,1,1,1,1 ));

But as I can see WordDelimiterFilter uses only the WhiteSpaceTokenizer
which does not use a Java-CC file.

What would be the best way to integrate (anyway, preferably not changing
lucene-src) this feature?

Should I override the WhitespaceTokenizer and using java-cc ( are there
any docs on doing this?).


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message