lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Trivial Update of "AnalyzersTokenizersTokenFilters" by SteveRowe
Date Wed, 15 Dec 2010 22:53:19 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "AnalyzersTokenizersTokenFilters" page has been changed by SteveRowe.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=100&rev2=101

--------------------------------------------------

  
  Like !StandardTokenizer, this tokenizer implements the word boundary rules from [[http://unicode.org/reports/tr29/#Word_Boundaries|Unicode
standard annex UAX#29]].  In addition, this tokenizer recognizes: full URLs using the `file:://`,
`http(s)://`, and `ftp://` schemes; hostnames with a registered TLD (top level domain, e.g.
".com"); IPv4 and IPv6 addresses; and e-mail addresses.
  
- In addition to the token types output by !StandardTokenizer from [[Solr3.1]] onward, !UAX29URLEmailTokenizer
can also output `<URL>` and `<EMAIL>` token types.
+ In addition to the token types output by !StandardTokenizer from [[Solr3.1]] onward, UAX29URLEmailTokenizer
can also output `<URL>` and `<EMAIL>` token types.
  
   . Example: `"Visit http://accarol.com/contact.htm?from=external&a=10 or e-mail bob.cratchet@accarol.com"`

   . `==> ALPHANUM:"Visit", URL:"http://accarol.com/contact.htm?from=external&a=10",
ALPHANUM:"or", ALPHANUM:"e-mail" EMAIL:"bob.cratchet@accarol.com"`

Mime
View raw message