incubator-any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ANY23-26) Upgrade dependency to Apache Tika 1.1
Date Mon, 16 Apr 2012 16:24:17 GMT

     [ https://issues.apache.org/jira/browse/ANY23-26?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lewis John McGibbney updated ANY23-26:
--------------------------------------

    Attachment: 19-object-data-data-uri.html
                14-img-src-data-url.html
                org.apache.any23.extractor.html.HCardExtractorTest.txt
                ANY23-26.patch

Initial WIP. This breaks HCardExtractorTest#testImgSrcDataUrl and #testObjectDataDataUri.


I've attached my failing tests, along with the two HTML documents which the tests currently
fail on. They both seem to be failing on either AbstractExtractorTestCase#assertExtract or
HCardExtractorTest#assertDefaultVCard... 

For reference we only use Tika core and parsers on the following two classes

./core/src/main/java/org/apache/any23/mime/TikaMIMETypeDetector.java:import org.apache.tika.mime.MimeTypes;
./core/src/main/java/org/apache/any23/encoding/TikaEncodingDetector.java:import org.apache.tika.parser.txt.CharsetDetector;
 
                
> Upgrade dependency to Apache Tika 1.1
> -------------------------------------
>
>                 Key: ANY23-26
>                 URL: https://issues.apache.org/jira/browse/ANY23-26
>             Project: Apache Any23
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Lewis John McGibbney
>             Fix For: 0.8.0
>
>         Attachments: 14-img-src-data-url.html, 19-object-data-data-uri.html, ANY23-26.patch,
org.apache.any23.extractor.html.HCardExtractorTest.txt
>
>
> Upgrading to Apache Tika will hopefully provide a wealth of benefits for the project.
This issue should act as an umbrella issue to track these changes. It would be great to delegate
as much as possible to Tika if deemed suitable to enhance functionality and to reduce our
dependencies on external projects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message