lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Solr Wiki] Update of "ExtractingRequestHandler" by JanHoydahl
Date Fri, 22 Jun 2012 12:24:44 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "ExtractingRequestHandler" page has been changed by JanHoydahl:


   * captureAttr=true|false - Index attributes of the Tika XHTML elements into separate fields,
named after the element.  For example, when extracting from HTML, Tika can return the href
attributes in <a> tags as fields named "a". See the examples below.
   * xpath=<XPath expression> - When extracting, only return Tika XHTML content that
satisfies the XPath expression.  See for
details on the format of Tika XHTML.  See also TikaExtractOnlyExampleOutput.
   * lowernames=true|false - Map all field names to lowercase with underscores.  For example,
Content-Type would be mapped to content_type.
+  * literalsOverride=true|false - <!> [[Solr4.0]] When true, literal field values will
override other values with same field name, such as metadata and content. Default: true
  If extractOnly is true, additional input parameters:

View raw message