lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Controlling Tika's metadata
Date Wed, 02 Feb 2011 16:13:44 GMT

On Jan 28, 2011, at 5:38 PM, Andreas Kemkes wrote:

> Just getting my feet wet with the text extraction using both schema and 
> solrconfig settings from the example directory in the 1.4 distribution, so I 
> might miss something obvious.
> 
> Trying to provide my own title (and discarding the one received through Tika's 
> metadata) wasn't straightforward. I had to use the following:
> 
> fmap.title=tika_title (to discard the Tika title)
> literal.attr_title=New Title (to provide the correct one)
> fmap.attr_title=title (to map it back to the field as I would like to use title 
> in searches)
> 
> Is there anything easier than the above?
> 
> How can this best be generalized to other metadata provided by Tika (which in 
> our use case will be mostly ignored, as it is provided separately)?

You can provide your own ContentHandler (see the wiki docs).  I think it would be reasonable
to patch the ExtractingRequestHandler to have a no metadata option and it wouldn't be that
hard.
Mime
View raw message