jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dirk Feufel (JIRA)" <j...@apache.org>
Subject [jira] Created: (JCR-1530) MsPowerPointTextExtractor does not extract from PPTs with € sign
Date Thu, 10 Apr 2008 11:40:47 GMT
MsPowerPointTextExtractor does not extract from PPTs with € sign

                 Key: JCR-1530
                 URL: https://issues.apache.org/jira/browse/JCR-1530
             Project: Jackrabbit
          Issue Type: Bug
          Components: jackrabbit-text-extractors
    Affects Versions: 1.4
            Reporter: Dirk Feufel

The MsPowerPointTextExtractor class has a problem when reading PPTs when an € sign is contained.
All text following that sign is ignored. Perhaps the POI PowerPointExtractor should be used
instead of parsing the data by hand. As a side effect, this would simply the code. Extracting
could be done as follows:

	public Reader extractText(InputStream stream, String type, String encoding) throws IOException
		try {
			PowerPointExtractor extractor = new PowerPointExtractor(stream);
			return new StringReader(extractor.getText(true,true));
		} catch (RuntimeException e) {
			logger.warn("Failed to extract PowerPoint text content", e);
			return new StringReader("");
		} finally {
			try { stream.close(); } catch (IOException ignored) {}

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message