lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan P Franzone" <jonat...@franzone.com>
Subject RE: Special characters
Date Sat, 20 Oct 2001 21:27:22 GMT
*This message was transferred with a trial version of CommuniGate(tm) Pro*
I just had that problem as well. I got around it by writing my own
Analyzer and Tokenizer. I called the Tokenizer WhitespaceTokenizer and
it basically delimits tokens based on whitespace instead of the default
Character.isLetter(ch). There is a test: Character.isWhitespace(ch) that
I used instead of the isLetter. Then the Analyzer was just like the default
only it used my WhitespaceTokenzier.

This does however present some other problems like punctuation. You don't
really want each word at the end of a sentence being tokenzied with the
period attached, but that's just something to add to the Tokenzier code.

I'm still in testing phases, but when I get a complete solution I'll post
the
code. Hope that helps!

Jonathan

-----Original Message-----
From: Ravi Damarla [mailto:rdmrl@optonline.net]
Sent: Saturday, October 20, 2001 5:19 AM
To: lucene-user@jakarta.apache.org
Subject: Special characters


*This message was transferred with a trial version of CommuniGate(tm) Pro*
Hello all:

I'm trying to add searching for strings containing special characters,
for example, 'c++'. I'm currently using StandardAnalyzer and the
'+' character is being ignored. Is there any way of doing this?

Thanks,
Ravi.
--
|      Ravi Damarla       |     rdmrl (at) optonline.net     |
| End of the Night Creations | http://www.endofthenight.com/ |


Mime
View raw message