jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Reutegger (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (JCR-1530) MsPowerPointTextExtractor does not extract from PPTs with € sign
Date Fri, 11 Apr 2008 11:17:06 GMT

     [ https://issues.apache.org/jira/browse/JCR-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Marcel Reutegger resolved JCR-1530.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.5

Applied patch in revision: 647114

Thank you for providing the patch.

> MsPowerPointTextExtractor does not extract from PPTs with € sign
> ----------------------------------------------------------------
>
>                 Key: JCR-1530
>                 URL: https://issues.apache.org/jira/browse/JCR-1530
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: jackrabbit-text-extractors
>    Affects Versions: 1.4
>            Reporter: Dirk Feufel
>             Fix For: 1.5
>
>
> The MsPowerPointTextExtractor class has a problem when reading PPTs when an € sign
is contained. All text following that sign is ignored. Perhaps the POI PowerPointExtractor
should be used instead of parsing the data by hand. As a side effect, this would simply the
code. Extracting could be done as follows:
> 	public Reader extractText(InputStream stream, String type, String encoding) throws IOException
{
> 		try {
> 			PowerPointExtractor extractor = new PowerPointExtractor(stream);
> 			return new StringReader(extractor.getText(true,true));
> 		} catch (RuntimeException e) {
> 			logger.warn("Failed to extract PowerPoint text content", e);
> 			return new StringReader("");
> 		} finally {
> 			try { stream.close(); } catch (IOException ignored) {}
> 		}
> 	}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message