lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ami dudu <amidu...@gmail.com>
Subject Re: Enhance StandardTokenizer to support words which will not be tokenized
Date Wed, 03 Jun 2009 19:10:22 GMT

This can be good solution but it will have to be maintained every update of
the StandardAnalyzer rules.
Is there a way to workaround it?


Grant Ingersoll-6 wrote:
> 
> You'd have to modify the JFlex grammar.  I'd suggest adding in a  
> generic "protected words" approach whereby you can pass in a list of  
> protected words.
> 
> This would be a nice patch/improvement.
> 
> -Grant
> 
> On Jun 3, 2009, at 4:07 AM, ami dudu wrote:
> 
>>
>> Hi, I'm using a StandardTokenizer which do great job for me but i  
>> need to
>> enhance it somehow to consider words like "c++" "c#", ".net" as is  
>> and not
>> tokenized it into "c" or "net".
>> I know that there are other tokenizers such as KeywordTokenizer and
>> WhitespaceTokenizer but they do not include the StandardTokenizer   
>> logic.
>> Any ideas on what is the best way to add this enhancement?
>>
>> Thanks,
>> Amid
>> -- 
>> View this message in context:
>> http://www.nabble.com/Enhance-StandardTokenizer-to-support-words-which-will-not-be-tokenized-tp23849495p23849495.html
>> Sent from the Lucene - Java Developer mailing list archive at  
>> Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Enhance-StandardTokenizer-to-support-words-which-will-not-be-tokenized-tp23849495p23857450.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message