oodt-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OODT-630) Upgrade OODT components from using Tika 0.8 to Tika 1.3
Date Wed, 19 Jun 2013 03:57:20 GMT

    [ https://issues.apache.org/jira/browse/OODT-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687567#comment-13687567
] 

Chris A. Mattmann commented on OODT-630:
----------------------------------------

please close OODT-385 when this is fixed.
                
> Upgrade OODT components from using Tika 0.8 to Tika 1.3
> -------------------------------------------------------
>
>                 Key: OODT-630
>                 URL: https://issues.apache.org/jira/browse/OODT-630
>             Project: OODT
>          Issue Type: Improvement
>          Components: file manager, metadata container, product server
>    Affects Versions: 0.6
>            Reporter: Rishi Verma
>            Assignee: Rishi Verma
>             Fix For: 0.7
>
>
> Currently, OODT makes use of Tika v0.8 (tika-core) for mime-detection purposes. This
version is quite out-of-date, and is incompatible with the use of a tika-core or tika-app
v1.3 JAR.
> Tika v1.3 contains numerous upgrades since 0.8 (see [1]), some of which include improved
metadata generation for common files. These improved features are extremely useful for metadata
gathering.
> If a project using OODT needs features provided with the v1.3 tika-core or tika-app JAR
(e.g. custom met extractor), currently they cannot use this version when interacting with
OODT server-side components like filemgr, crawler etc. since it is incompatible with OODT's
use of v0.8.
> One of the incompatibilities is the deprecation of the 'getMimeType' method within org.apache.tika.mime.MimeTypes.getMimeType(URL).
This has been supplemented with Tika.detect(URL.getPath()) & MimeTypes.getRegisteredMimeType(String)
> See example exception thrown below. when crawler 0.6-SNAPSHOT was invoked while a 'tika-app-1.3.jar'
was placed in the crawler's lib directory:
> ---
> Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.crawl.ProductCrawler ingest
> INFO: ProductCrawler: Ready to ingest product: [/data/staging/IMG_2590.jpg]: ProductType:
[GenericFile]
> Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.ingest.StdIngester setFileManager
> INFO: StdIngester: connected to file manager: [http://localhost:9000]
> Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer
setFileManagerUrl
> INFO: In Place Data Transfer to: [http://localhost:9000] enabled
> Exception in thread "main" java.lang.NoSuchMethodError: org.apache.tika.mime.MimeTypes.getMimeType(Ljava/net/URL;)Lorg/apache/tika/mime/MimeType;
> at org.apache.oodt.cas.filemgr.structs.Reference.<init>(Reference.java:115)
> at org.apache.oodt.cas.filemgr.versioning.VersioningUtils.addRefsFromUris(VersioningUtils.java:251)
> at org.apache.oodt.cas.filemgr.ingest.StdIngester.ingest(StdIngester.java:189)
> at org.apache.oodt.cas.crawl.ProductCrawler.ingest(ProductCrawler.java:304)
> at org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:188)
> at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:108)
> at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)
> at org.apache.oodt.cas.crawl.daemon.CrawlDaemon.startCrawling(CrawlDaemon.java:82)
> at org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:55)
> at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
> at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)
> at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)
> ---
> This JIRA issue is seeks to document efforts to upgrade OODT's use of tika from 0.8 to
1.3. 
> ---
> [1] http://www.apache.org/dist/tika/CHANGES-1.3.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message