lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Enhance StandardTokenizer to support words which will not be tokenized
Date Wed, 03 Jun 2009 16:10:44 GMT
You'd have to modify the JFlex grammar.  I'd suggest adding in a  
generic "protected words" approach whereby you can pass in a list of  
protected words.

This would be a nice patch/improvement.

-Grant

On Jun 3, 2009, at 4:07 AM, ami dudu wrote:

>
> Hi, I'm using a StandardTokenizer which do great job for me but i  
> need to
> enhance it somehow to consider words like "c++" "c#", ".net" as is  
> and not
> tokenized it into "c" or "net".
> I know that there are other tokenizers such as KeywordTokenizer and
> WhitespaceTokenizer but they do not include the StandardTokenizer   
> logic.
> Any ideas on what is the best way to add this enhancement?
>
> Thanks,
> Amid
> -- 
> View this message in context: http://www.nabble.com/Enhance-StandardTokenizer-to-support-words-which-will-not-be-tokenized-tp23849495p23849495.html
> Sent from the Lucene - Java Developer mailing list archive at  
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message