lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <karl.wet...@gmail.com>
Subject Re: Standard Analyzer
Date Mon, 25 Aug 2008 09:26:34 GMT

25 aug 2008 kl. 11.14 skrev Kalani Ruwanpathirana:

> Hi,
>
> Thanks, I tried WhitespaceAnalyzer too, but it seems case sensitive.

Then you simply add a LowercaseFilter to the chain in the Analyzer:

public final class WhitespaceAnalyzer extends Analyzer {
   public TokenStream tokenStream(String fieldName, Reader reader) {
-    return new WhitespaceTokenizer(reader);
+    TokenStream ts = new WhitespaceTokenizer(reader);
+    ts = new LowercaseFilter(ts);
+    return ts;
   }


> If I need to search for words like "correct?", "<html>" (it escapes  
> <, > and
> another few characters too) I need to index those kind of words.

That sounds like an XY-problem to me:
http://www.perlmonks.org/index.pl?node_id=542341

What I really was asking about is what problem it is you are trying to  
solve by indexing and searching for these sort of tokens.


        karl


>
>
> On Mon, Aug 25, 2008 at 1:15 PM, Karl Wettin <karl.wettin@gmail.com>  
> wrote:
>
>>
>> 25 aug 2008 kl. 09.19 skrev Kalani Ruwanpathirana:
>>
>> Hi,
>>>
>>> I am using StandardAnalyzer when creating the Lucene index. It  
>>> indexes the
>>> word "wo&rk" as it is but does not index the word "wo*rk" in that  
>>> manner.
>>> Can I index such words (including * and ?) as it is? Otherwise I  
>>> have no
>>> way
>>> to index and search for words like "wo*rk", you?, etc.
>>>
>>
>>
>> Try an alternative analyzer, perhaps WhitespaceAnalyzer?  
>> (StandardAnalyzer
>> will index wo&rk as a single term because it contains a rule to  
>> handle names
>> such as AT&T.)
>>
>> You should probably also explain why you need to create an index  
>> like this.
>>
>>
>>
>>       karl
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> -- 
> Kalani Ruwanpathirana
> Department of Computer Science & Engineering
> University of Moratuwa


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message