lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Trivial Update of "ExtractingRequestHandler" by GrantIngersoll
Date Mon, 24 Nov 2008 19:57:01 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by GrantIngersoll:
http://wiki.apache.org/solr/ExtractingRequestHandler

------------------------------------------------------------------------------
   * ext.extract.only = true|false - Default is false.  If true, return the extracted content
from Tika without indexing the document.  This literally includes the extracted XHTML as a
<str> in the response.  See TikaExtractOnlyExampleOutput.
   * ext.idx.attr = true|false - Index the Tika XHTML attributes into separate fields, named
after the attribute.  For example, when extracting from HTML, Tika can return the href values
of <a> tags as attributes of a tag name.  See the examples below.
   * ext.def.fl = <NAME> - The name of the field to add the default content to.  See
also ext.capture below.  This NAME is not mapped, but it can be boosted.
-  * ext.capture = <Tika XHTML NAME> - Capture fields with the name separately for adding
to the Solr document.
+  * ext.capture = <Tika XHTML NAME> - Capture fields with the name separately for adding
to the Solr document.  This can be useful for grabbing chunks of the XHTML into a separate
field.  For instance, it could be used to grab paragraphs (<p>) and index them into
a separate field.  Note that content is also still captured into the overall string buffer.
  
  
  = Examples =

Mime
View raw message