lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Hajek <Joe.Ha...@blackbox.net>
Subject Re(2): Lucene with Number+Text
Date Tue, 26 Mar 2002 08:10:00 GMT
the problem with whitespaceanalyzer is that if you have for example a sentence in the text
say "lucene is indexing." a query for "indexing" will produce no hits because "." is not a
token delimiter. you will have to search for "indexing*".

for me the solution was to write my own tokenizer/analyzer pair

--snip

and 
--snip

public final class myTokenizer extends CharTokenizer {
/** Construct a new LowerCaseTokenizer. */
public myTokenizer(Reader in) {
super(in);
}

/** Collects only characters which satisfy
* {@link Character#isLetter(char)}.*/
protected char normalize(char c) {
return Character.toLowerCase(c);
}


/** Collects only characters which do not satisfy
* {@link Character#isWhitespace(char)}.*/
protected boolean isTokenChar(char c) {
return Character.isLetterOrDigit(c);
}
}
public final class myAnalyzer extends Analyzer {
public final TokenStream tokenStream(String fieldName, Reader reader) {
return new myTokenizer(reader);
}
}
--snip


regards joe

"RAYMOND Romain" <romain.raymond@c-s.fr> writes on 
Tue, 26 Mar 2002 08:53:51 +0100 (MET):

> hello,
> 
> The solution we adopted is to use WhiteSpaceAnalyser.
> If you print the result of a query after parsing it (with parse
> method)
> the tokenizers used delete the numbers from the query.
> But WhiteSpaceAnalyser only tokenizes based on ... spaces, so we can
> search on numbers values ....
> 
> --
> To unsubscribe, e-mail: 
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> 


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message