incubator-ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pei Chen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CTAKES-105) Add Apache Tika integration
Date Tue, 27 Nov 2012 17:19:58 GMT
Pei Chen created CTAKES-105:
-------------------------------

             Summary: Add Apache Tika integration
                 Key: CTAKES-105
                 URL: https://issues.apache.org/jira/browse/CTAKES-105
             Project: cTAKES
          Issue Type: New Feature
            Reporter: Pei Chen
            Priority: Minor
             Fix For: future enhancement


Would be nice to add in a util/pre-processor to intake any document type (scanned pdf, image,
word, pdf, xls, etc.), have something like Apache Tika automatically detect the type, OCR,
extract the plain-text, and then feed it to the pipeline.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message