lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Espina <espinaemman...@gmail.com>
Subject UpdateRequestProcessor to extract Solr XML from rich documents
Date Wed, 14 Mar 2012 16:07:46 GMT
I've created an update request handler to save a file with the xml
that represents the document in an external directory. The original
idea behind this was to add it to the processing chain of the
ExtractingRequestHandler to store an already parsed version of the
docs. This storage of pre-parsed documents will make the re indexing
of the entire index faster (avoiding the Tika phase, and just sending
the xml to the standard update processor).
As a side effect, extracting the xml can make debugging of rich docs easier.

I'm attaching a first and very simple POC of it. What are your
opinions on adding this as a jira issue?

Thanks
Emmanuel

Mime
View raw message