lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Search "C++" with Solrs WordDelimiterFilter
Date Fri, 17 Nov 2006 18:47:45 GMT

WordDelimiterFilter doesn't explicitly use an Tokenizer -- thats the
bueaty of TokenFilters, you can compose them arround any other TokenStream
instance that you want.

If you have a custom grammer file of your own that you like, you can use
it to build your own Tokenizer and then wrap that up in a
WordDelimiterFilter (and any other filters you want) to make a custom
Analyzer ... this is all StandardAnalyzer does, it wraps the
StandardTokenizer (which is built from a .jj file) with a few useful
TokenFilters.


: Date: Fri, 17 Nov 2006 11:04:43 +0100
: From: Martin Braun <mbraun@uni-hd.de>
: Reply-To: java-user@lucene.apache.org, mbraun@uni-hd.de
: To: java-user@lucene.apache.org
: Subject: Search "C++" with Solrs WordDelimiterFilter
:
: hi all,
:
: I would like to implement the possibility to search for "C++" and "C#" -
: I found in the archive the hint to customize the appropriate *.jj  file
: with the code in NutchAnalysis.jj:
:
:      // irregular words
: | <#IRREGULAR_WORD: (<C_PLUS_PLUS>|<C_SHARP>)>
: | <#C_PLUS_PLUS: ("C"|"c") "++" >
: | <#C_SHARP: ("C"|"c") "#" >
:
: I am using a custum analyzer with the yonik's WordDelimiterFilter:
:
: @Override
: 	public TokenStream tokenStream(String fieldName, Reader reader) {
:
: 		return new LowerCaseFilter(new WordDelimiterFilter(new
: WhitespaceTokenizer(reader),1,1,1,1,1 ));
: 	}
:
:
: But as I can see WordDelimiterFilter uses only the WhiteSpaceTokenizer
: which does not use a Java-CC file.
:
: What would be the best way to integrate (anyway, preferably not changing
: lucene-src) this feature?
:
: Should I override the WhitespaceTokenizer and using java-cc ( are there
: any docs on doing this?).
:
: tia,
: martin
:
:
:
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message