lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: StandardAnalyzer exclude numbers
Date Mon, 22 Sep 2008 12:50:19 GMT
Agreed. I am always diving into that analyzer too fast <g> Possibly
premature optimization thoughts as well. But scanning the token after in
a filter and breaking/skipping if you find a number will be much easier
and possibly not too much slower. Depends on how involved you are/want
to get I suppose. Personally I would prefer to start a new analyzer for
such a significant change, but for the average Lucene user, pre/post
processing is always going to make more sense. Plus there is enough
overlap in the code that I can see plenty of people preferring not to
split off.

黄成 wrote:
> why not use a token filter?
>
> On Mon, Sep 22, 2008 at 8:36 PM, Mark Miller <markrmiller@gmail.com> wrote:
>
>   
>> jim@tera.gr wrote:
>>
>>     
>>> Hello
>>>
>>> Is it possible to exclude numbers using StandardAnalyzer just like
>>> SimpleAnalyzer?
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>  Its possible but its tricky. You would want to copy the StandardAnalyzer
>>>       
>> into your own Analyzer and then modify the grammar.
>> StandardTokenizerImpl.jflex is where to look, but you will have to learn how
>> to use/compile jflex (look at the build file) to build the parser classes.
>> What you would do though, is start by trying to remove the digit from the
>> Alphanum regex in StandardTokenizerImpl.jflex. You might want to rename
>> alphanum after such a move. That may be as far as you need to go.
>>
>>
>> - Mark
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message