lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Assigned: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler
Date Sat, 08 Aug 2009 01:41:14 GMT

     [ https://issues.apache.org/jira/browse/SOLR-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll reassigned SOLR-1274:
-------------------------------------

    Assignee: Grant Ingersoll

> Provide multiple output formats in extract-only mode for tika handler
> ---------------------------------------------------------------------
>
>                 Key: SOLR-1274
>                 URL: https://issues.apache.org/jira/browse/SOLR-1274
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 1.4
>            Reporter: Peter Wolanin
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-1274.patch, SOLR-1274.patch
>
>
> The proposed feature is to accept a URL parameter when using extract-only mode to specify
an output format.  This parameter might just overload the existing "ext.extract.only" so that
one can optionally specify a format, e.g. false|true|xml|text  where true and xml give the
same response (i.e. xml remains the default)
> I had been assuming that I could choose among possible tika output
> formats when using the extracting request handler in extract-only mode
> as if from the CLI with the tika jar:
>    -x or --xml        Output XHTML content (default)
>    -h or --html       Output HTML content
>    -t or --text       Output plain text content
>    -m or --metadata   Output only metadata
> However, looking at the docs and source, it seems that only the xml
> option is available (hard-coded) in ExtractingDocumentLoader.java
> {code}
> serializer = new XMLSerializer(writer, new OutputFormat("XML", "UTF-8", true));
> {code}
> Providing at least a plain-text response seems to work if you change the serializer to
a TextSerializer (org.apache.xml.serialize.TextSerializer).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message