lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by Bill Bell
Date Fri, 08 Jul 2011 06:56:31 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "AnalyzersTokenizersTokenFilters" page has been changed by Bill Bell:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?action=diff&rev1=121&rev2=122

  
  === solr.HTMLStripCharFilterFactory ===
  Creates `org.apache.solr.analysis.HTMLStripCharFilter`. `HTMLStripCharFilter` strips HTML
from the input stream and passes the result to either `CharFilter` or `Tokenizer`.  Like other
CharFilters, it's specified using a <charFilter> tag, and must come before the <tokenizer>.
 An example:
+ 
  {{{
  <analyzer>
    <charFilter class="solr.HTMLStripCharFilterFactory"/>
@@ -116, +117 @@

    <filter class="solr.StandardFilterFactory"/>
  </analyzer>
  }}}
- 
  HTML stripping features:
  
   * The input need not be an HTML document as only constructs that look like HTML will be
removed.
@@ -134, +134 @@

     * terminating '`;`' is mandatory to avoid false matches on something like "`Alpha&Omega
Corp`"
  
  HTML stripping examples:
- ||{{{my <a href="www.foo.bar">link</a> }}}||`my link `||
+ ||{{{my <a href="www.foo.bar">link</a> }}} ||`my link ` ||
- ||{{{<br>hello<!--comment--> }}}||`hello `||
+ ||{{{<br>hello<!--comment--> }}} ||`hello ` ||
- ||{{{hello<script><!-- f('<!--internal--></script>'); --></script>
}}}||`hello `||
+ ||{{{hello<script><!-- f('<!--internal--></script>'); --></script>
}}} ||`hello ` ||
- ||{{{if a<b then print a; }}}||`if a<b then print a; `||
+ ||{{{if a<b then print a; }}} ||`if a<b then print a; ` ||
- ||{{{hello <td height=22 nowrap align="left"> }}}||`hello `||
+ ||{{{hello <td height=22 nowrap align="left"> }}} ||`hello ` ||
- ||{{{a<b &#65; Alpha&Omega O}}} ||`a<b A Alpha&Omega O `||
+ ||{{{a<b &#65; Alpha&Omega O}}} ||`a<b A Alpha&Omega O ` ||
- ||{{{M&eacute;xico}}}||`México`||
+ ||{{{M&eacute;xico}}} ||`México` ||
  
  
  

Mime
View raw message