lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: analyzer effecting phrases?
Date Mon, 20 Dec 2004 19:34:01 GMT
On Dec 20, 2004, at 12:43 PM, Peter Posselt Vestergaard wrote:
> Therefore I turned back to the standard analyzer and now do some 
> replacing
> of the underscores in my ID string to avoid my original problem. This 
> solved
> my phrase problem so that I can now search for phrases. However I 
> still have
> the problem with ",.:" described above. As far as I can see the
> StandardAnalyzer (the StandardTokenizer that is) should tokenize words
> without the ",.:" characters. Am I mistaken? Is there a tokenizer that 
> will
> do this?

StandardAnalyzer does tokenize without ",.:", though it will keep 
domain names together.  Here's an example:

$ ant -emacs AnalyzerDemo
Buildfile: build.xml

AnalyzerDemo:

       Demonstrates analysis of sample text.

       Refer to the "Analysis" chapter for much more on this
       extremely crucial topic.

Press return to continue...

String to analyze: [This string will be analyzed.]
Example with commas, colons, and dots.  You can get this code from 
http://www.lucenebook.com
Running lia.analysis.AnalyzerDemo...
Analyzing "Example with commas, colons, and dots.  You can get this 
code from http://www.lucenebook.com"
   WhitespaceAnalyzer:
     [Example] [with] [commas,] [colons,] [and] [dots.] [You] [can] 
[get] [this] [code] [from] [http://www.lucenebook.com]

   SimpleAnalyzer:
     [example] [with] [commas] [colons] [and] [dots] [you] [can] [get] 
[this] [code] [from] [http] [www] [lucenebook] [com]

   StopAnalyzer:
     [example] [commas] [colons] [dots] [you] [can] [get] [code] [from] 
[http] [www] [lucenebook] [com]

   StandardAnalyzer:
     [example] [commas] [colons] [dots] [you] [can] [get] [code] [from] 
[http] [www.lucenebook.com]




BUILD SUCCESSFUL
Total time: 7 seconds


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message