incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Davide Palmisano (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CLEREZZA-182) Integrate Apache Tika inside Apache Clerezza
Date Sun, 26 Sep 2010 19:21:33 GMT

    [ https://issues.apache.org/jira/browse/CLEREZZA-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915021#action_12915021
] 

Davide Palmisano commented on CLEREZZA-182:
-------------------------------------------

Dear Tommaso,

In the attached patch[1] (taken from /trunk/org.apache.clerezza.parent/org.apache.clerezza.uima/org.apache.clerezza.uima.metadata-generator)
you can find an attempt to integrate Apache Tika 0.7 implementing the MediaTypeTextExtractor
interface. My modifies foresee:

1) tika dependency added to the pom.xml
2) two tests (one for my implementation, TikaTextExtractor, and one for your PlainTextExtractor
class)
3) some added javadocs on the MediaTypeTextExtractor interface.
4) a couple of new constructors for the UnsupportedMediaTypeException exception.

let me know if it fits your needs.

Davide

[1] CLEREZZA-182.patch

> Integrate Apache Tika inside Apache Clerezza
> --------------------------------------------
>
>                 Key: CLEREZZA-182
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-182
>             Project: Clerezza
>          Issue Type: New Feature
>            Reporter: Tommaso Teofili
>         Attachments: CLEREZZA-182.patch
>
>
> Apache Tika is a toolkit for detecting and extracting metadata and structured text content
from various documents using existing parser libraries and it would be nice to have it integrated
inside Apache Clerezza so that Resources could be easily enriched and auto-tagged with Metadata
once inside Clerezza

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message