lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Solr Wiki] Update of "ExtractingRequestHandler" by GrantIngersoll
Date Sat, 15 Nov 2008 12:45:28 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by GrantIngersoll:

+ In the defaults section, we are mapping Tika's Last-Modified Metadata attribute to a field
named last_modified.  We are also telling it to ignore undeclared fields.  These are all overridden
+ The tika.config entry points to a file containing a Tika configuration.  You would only
need this if you have customized your own Tika configuration.  The Tika config contains info
about parsers, mime types, etc.
+ Lastly, the date.formats allows you to specify various java.text.SimpleDateFormat date formats
for working with transforming extracted input to a Date.  Solr comes configured with the following
date formats (see the DateUtil class in Solr)
+ {{{
+ yyyy-MM-dd'T'HH:mm:ss'Z'
+ yyyy-MM-dd'T'HH:mm:ss
+ yyyy-MM-dd
+ yyyy-MM-dd hh:mm:ss
+ yyyy-MM-dd HH:mm:ss
+ EEE MMM d hh:mm:ss z yyyy
+ EEE, dd MMM yyyy HH:mm:ss zzz
+ EEEE, dd-MMM-yy HH:mm:ss zzz
+ EEE MMM d HH:mm:ss yyyy
+ }}}
  = Input Parameters =
   *<Tika Metadata Attribute> = Solr Field Name - Map a Tika metadata attribute
to a Solr field name.  If no mapping is specified, the metadata attribute will be used as
the field name.  If the field name doesn't exist, it can be ignored by setting the "ignore
undeclared fields" (ext.ignore.und.fl) attribute described below

View raw message