lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Emmanuel Espina (Commented) (JIRA)" <>
Subject [jira] [Commented] (SOLR-3246) UpdateRequestProcessor to extract Solr XML from rich documents
Date Fri, 16 Mar 2012 18:01:41 GMT


Emmanuel Espina commented on SOLR-3246:

Probably the output format could be set in a similar way to how it's done with the response
writers. In that way the XMLWritingUpdateProcessor would be just WritingUpdateProcessor and
the writer can be selected with a parameter in the configuration, having a default (being
that xml or csv). That would be:

<updateRequestProcessorChain name="writer">
    <processor class="org.apache.solr.update.processor.WritingUpdateProcessorFactory">
      <str name="outputDir">"./dacDumps</str>
      <str name="writer">xml</str>
      <str name="groupFiles">100</str>

Also with another parameter one could select to add to the same file one, n or unlimited documents.

> UpdateRequestProcessor to extract Solr XML from rich documents
> --------------------------------------------------------------
>                 Key: SOLR-3246
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Emmanuel Espina
>            Priority: Minor
>         Attachments: SOLR-3246.patch
> This would be an update request handler to save a file with the xml that represents the
document in an external directory. The original
> idea behind this was to add it to the processing chain of the ExtractingRequestHandler
to store an already parsed version of the docs. This storage of pre-parsed documents will
make the re indexing of the entire index faster (avoiding the Tika phase, and just sending
the xml to the standard update processor).
> As a side effect, extracting the xml can make debugging of rich docs easier.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message