lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <>
Subject [jira] Resolved: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler
Date Sat, 08 Aug 2009 01:41:14 GMT


Grant Ingersoll resolved SOLR-1274.

    Resolution: Fixed

I committed this patch, plus a test for it.

> Provide multiple output formats in extract-only mode for tika handler
> ---------------------------------------------------------------------
>                 Key: SOLR-1274
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 1.4
>            Reporter: Peter Wolanin
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 1.4
>         Attachments: SOLR-1274.patch, SOLR-1274.patch
> The proposed feature is to accept a URL parameter when using extract-only mode to specify
an output format.  This parameter might just overload the existing "ext.extract.only" so that
one can optionally specify a format, e.g. false|true|xml|text  where true and xml give the
same response (i.e. xml remains the default)
> I had been assuming that I could choose among possible tika output
> formats when using the extracting request handler in extract-only mode
> as if from the CLI with the tika jar:
>    -x or --xml        Output XHTML content (default)
>    -h or --html       Output HTML content
>    -t or --text       Output plain text content
>    -m or --metadata   Output only metadata
> However, looking at the docs and source, it seems that only the xml
> option is available (hard-coded) in
> {code}
> serializer = new XMLSerializer(writer, new OutputFormat("XML", "UTF-8", true));
> {code}
> Providing at least a plain-text response seems to work if you change the serializer to
a TextSerializer (org.apache.xml.serialize.TextSerializer).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message