oodt-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rishi Verma (JIRA)" <j...@apache.org>
Subject [jira] [Created] (OODT-630) Upgrade OODT components from using Tika 0.8 to Tika 1.3
Date Wed, 19 Jun 2013 00:40:20 GMT
Rishi Verma created OODT-630:
--------------------------------

             Summary: Upgrade OODT components from using Tika 0.8 to Tika 1.3
                 Key: OODT-630
                 URL: https://issues.apache.org/jira/browse/OODT-630
             Project: OODT
          Issue Type: Improvement
          Components: file manager, metadata container, product server
    Affects Versions: 0.6
            Reporter: Rishi Verma
            Assignee: Rishi Verma
             Fix For: 0.7


Currently, OODT makes use of Tika v0.8 (tika-core) for mime-detection purposes. This version
is quite out-of-date, and is incompatible with the use of a tika-core or tika-app v1.3 JAR.

Tika v1.3 contains numerous upgrades since 0.8 (see [1]), some of which include improved metadata
generation for common files. These improved features are extremely useful for metadata gathering.

If a project using OODT needs features provided with the v1.3 tika-core or tika-app JAR (e.g.
custom met extractor), currently they cannot use this version when interacting with OODT server-side
components like filemgr, crawler etc. since it is incompatible with OODT's use of v0.8.

One of the incompatibilities is the deprecation of the 'getMimeType' method within org.apache.tika.mime.MimeTypes.getMimeType(URL).
This has been supplemented with Tika.detect(URL.getPath())

See example exception thrown below. when crawler 0.6-SNAPSHOT was invoked while a 'tika-app-1.3.jar'
was placed in the crawler's lib directory:
---
Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.crawl.ProductCrawler ingest
INFO: ProductCrawler: Ready to ingest product: [/data/staging/IMG_2590.jpg]: ProductType:
[GenericFile]
Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.ingest.StdIngester setFileManager
INFO: StdIngester: connected to file manager: [http://localhost:9000]
Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer setFileManagerUrl
INFO: In Place Data Transfer to: [http://localhost:9000] enabled
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.tika.mime.MimeTypes.getMimeType(Ljava/net/URL;)Lorg/apache/tika/mime/MimeType;
at org.apache.oodt.cas.filemgr.structs.Reference.<init>(Reference.java:115)
at org.apache.oodt.cas.filemgr.versioning.VersioningUtils.addRefsFromUris(VersioningUtils.java:251)
at org.apache.oodt.cas.filemgr.ingest.StdIngester.ingest(StdIngester.java:189)
at org.apache.oodt.cas.crawl.ProductCrawler.ingest(ProductCrawler.java:304)
at org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:188)
at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:108)
at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)
at org.apache.oodt.cas.crawl.daemon.CrawlDaemon.startCrawling(CrawlDaemon.java:82)
at org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:55)
at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)
at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)
---

This JIRA issue is seeks to document efforts to upgrade OODT's use of tika from 0.8 to 1.3.



---
[1] http://www.apache.org/dist/tika/CHANGES-1.3.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message