tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan O'Hara <oh...@genome.chop.edu>
Subject Help with Custom Analyzer
Date Mon, 16 Oct 2006 16:28:00 GMT
I have a few questions regarding writing a custom analyzer.

My situation is that I would like to use the StandardAnalyzer but  
with some data-specific rules.  I was wondering if there was a way of  
telling the StandardAnalyzer to treat a string of text, that would  
normally be tokenized into more than one token, as only one token  
(maybe by inserting quotes around the text).  For example, say the  
StandardAnalyzer normally splits the string of text  
ohara@genome.chop.edu into 4 tokens, but I want it to split the  
string into only 1 token.  Could I accomplish this by surrounding the  
string with quotes or by using some other type of flag?

Another question I have is how do I modify the text being analyzed?   
 From how I interpreted what I have read (which could easily be off),  
it looks like in order to accomplish what I have previously  
described, I am going to have to add some code to my custom  
analyzer's tokenStream method.  I see that tokenStream() has a Field  
and a Reader as parameters.  Would the way I go about adding rules be  
to edit the reader text?  If so, would manipulation of the text be  
easier if I were to convert the reader into a string?

Any help is greatly appreciated.  Thanks.

-Ryan

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message