lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: analyzer effecting phrases?
Date Mon, 20 Dec 2004 19:34:01 GMT
On Dec 20, 2004, at 12:43 PM, Peter Posselt Vestergaard wrote:
> Therefore I turned back to the standard analyzer and now do some 
> replacing
> of the underscores in my ID string to avoid my original problem. This 
> solved
> my phrase problem so that I can now search for phrases. However I 
> still have
> the problem with ",.:" described above. As far as I can see the
> StandardAnalyzer (the StandardTokenizer that is) should tokenize words
> without the ",.:" characters. Am I mistaken? Is there a tokenizer that 
> will
> do this?

StandardAnalyzer does tokenize without ",.:", though it will keep 
domain names together.  Here's an example:

$ ant -emacs AnalyzerDemo
Buildfile: build.xml


       Demonstrates analysis of sample text.

       Refer to the "Analysis" chapter for much more on this
       extremely crucial topic.

Press return to continue...

String to analyze: [This string will be analyzed.]
Example with commas, colons, and dots.  You can get this code from
Running lia.analysis.AnalyzerDemo...
Analyzing "Example with commas, colons, and dots.  You can get this 
code from"
     [Example] [with] [commas,] [colons,] [and] [dots.] [You] [can] 
[get] [this] [code] [from] []

     [example] [with] [commas] [colons] [and] [dots] [you] [can] [get] 
[this] [code] [from] [http] [www] [lucenebook] [com]

     [example] [commas] [colons] [dots] [you] [can] [get] [code] [from] 
[http] [www] [lucenebook] [com]

     [example] [commas] [colons] [dots] [you] [can] [get] [code] [from] 
[http] []

Total time: 7 seconds

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message