lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Bamford <chris.bamf...@scalix.com>
Subject SnowballAnalyzer question
Date Fri, 08 Aug 2008 12:07:25 GMT
Hi.

I am using the SnowballAnalyzer because of it's multi-language stemming 
capabilities - and am very happy with that.
There is one small glitch which I'm hoping to overcome - can I get it to 
split up internet domain names in the same way that StopAnalyzer does?
i.e.  for the sentence "This is a URL: www.google.de / this is a company 
name: XY&Z Corporation", here is the default output from the two analysers:

 StopAnalyzer:
    [url] [www] [google] [de] [company] [name] [xy] [z] [corporation]

 SnowballAnalyzer:
    [this] [is] [a] [url] [www.google.d] [this] [is] [a] [compani] 
[name] [xy&z] [corpor]

Ideally I would like "www.google.de" to be split into [www] [google] 
[de] (rather than [www.google.d]), but retain the rest of the  
SnowballAnalyzer's capabilities.
Can I perhaps extend  SnowballAnalyzer to allow me to achieve this?

Thanks for any tips / pointers,

- Chris


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message