oodt-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rishi Verma (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (OODT-630) Upgrade OODT components from using Tika 0.8 to Tika 1.3
Date Wed, 19 Jun 2013 00:48:20 GMT

     [ https://issues.apache.org/jira/browse/OODT-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rishi Verma updated OODT-630:
-----------------------------

    Description: 
Currently, OODT makes use of Tika v0.8 (tika-core) for mime-detection purposes. This version
is quite out-of-date, and is incompatible with the use of a tika-core or tika-app v1.3 JAR.

Tika v1.3 contains numerous upgrades since 0.8 (see [1]), some of which include improved metadata
generation for common files. These improved features are extremely useful for metadata gathering.

If a project using OODT needs features provided with the v1.3 tika-core or tika-app JAR (e.g.
custom met extractor), currently they cannot use this version when interacting with OODT server-side
components like filemgr, crawler etc. since it is incompatible with OODT's use of v0.8.

One of the incompatibilities is the deprecation of the 'getMimeType' method within org.apache.tika.mime.MimeTypes.getMimeType(URL).
This has been supplemented with Tika.detect(URL.getPath()) & MimeTypes.getRegisteredMimeType(String)

See example exception thrown below. when crawler 0.6-SNAPSHOT was invoked while a 'tika-app-1.3.jar'
was placed in the crawler's lib directory:
---
Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.crawl.ProductCrawler ingest
INFO: ProductCrawler: Ready to ingest product: [/data/staging/IMG_2590.jpg]: ProductType:
[GenericFile]
Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.ingest.StdIngester setFileManager
INFO: StdIngester: connected to file manager: [http://localhost:9000]
Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer setFileManagerUrl
INFO: In Place Data Transfer to: [http://localhost:9000] enabled
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.tika.mime.MimeTypes.getMimeType(Ljava/net/URL;)Lorg/apache/tika/mime/MimeType;
at org.apache.oodt.cas.filemgr.structs.Reference.<init>(Reference.java:115)
at org.apache.oodt.cas.filemgr.versioning.VersioningUtils.addRefsFromUris(VersioningUtils.java:251)
at org.apache.oodt.cas.filemgr.ingest.StdIngester.ingest(StdIngester.java:189)
at org.apache.oodt.cas.crawl.ProductCrawler.ingest(ProductCrawler.java:304)
at org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:188)
at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:108)
at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)
at org.apache.oodt.cas.crawl.daemon.CrawlDaemon.startCrawling(CrawlDaemon.java:82)
at org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:55)
at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)
at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)
---

This JIRA issue is seeks to document efforts to upgrade OODT's use of tika from 0.8 to 1.3.



---
[1] http://www.apache.org/dist/tika/CHANGES-1.3.txt

  was:
Currently, OODT makes use of Tika v0.8 (tika-core) for mime-detection purposes. This version
is quite out-of-date, and is incompatible with the use of a tika-core or tika-app v1.3 JAR.

Tika v1.3 contains numerous upgrades since 0.8 (see [1]), some of which include improved metadata
generation for common files. These improved features are extremely useful for metadata gathering.

If a project using OODT needs features provided with the v1.3 tika-core or tika-app JAR (e.g.
custom met extractor), currently they cannot use this version when interacting with OODT server-side
components like filemgr, crawler etc. since it is incompatible with OODT's use of v0.8.

One of the incompatibilities is the deprecation of the 'getMimeType' method within org.apache.tika.mime.MimeTypes.getMimeType(URL).
This has been supplemented with Tika.detect(URL.getPath())

See example exception thrown below. when crawler 0.6-SNAPSHOT was invoked while a 'tika-app-1.3.jar'
was placed in the crawler's lib directory:
---
Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.crawl.ProductCrawler ingest
INFO: ProductCrawler: Ready to ingest product: [/data/staging/IMG_2590.jpg]: ProductType:
[GenericFile]
Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.ingest.StdIngester setFileManager
INFO: StdIngester: connected to file manager: [http://localhost:9000]
Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer setFileManagerUrl
INFO: In Place Data Transfer to: [http://localhost:9000] enabled
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.tika.mime.MimeTypes.getMimeType(Ljava/net/URL;)Lorg/apache/tika/mime/MimeType;
at org.apache.oodt.cas.filemgr.structs.Reference.<init>(Reference.java:115)
at org.apache.oodt.cas.filemgr.versioning.VersioningUtils.addRefsFromUris(VersioningUtils.java:251)
at org.apache.oodt.cas.filemgr.ingest.StdIngester.ingest(StdIngester.java:189)
at org.apache.oodt.cas.crawl.ProductCrawler.ingest(ProductCrawler.java:304)
at org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:188)
at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:108)
at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)
at org.apache.oodt.cas.crawl.daemon.CrawlDaemon.startCrawling(CrawlDaemon.java:82)
at org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:55)
at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)
at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)
---

This JIRA issue is seeks to document efforts to upgrade OODT's use of tika from 0.8 to 1.3.



---
[1] http://www.apache.org/dist/tika/CHANGES-1.3.txt

    
> Upgrade OODT components from using Tika 0.8 to Tika 1.3
> -------------------------------------------------------
>
>                 Key: OODT-630
>                 URL: https://issues.apache.org/jira/browse/OODT-630
>             Project: OODT
>          Issue Type: Improvement
>          Components: file manager, metadata container, product server
>    Affects Versions: 0.6
>            Reporter: Rishi Verma
>            Assignee: Rishi Verma
>             Fix For: 0.7
>
>
> Currently, OODT makes use of Tika v0.8 (tika-core) for mime-detection purposes. This
version is quite out-of-date, and is incompatible with the use of a tika-core or tika-app
v1.3 JAR.
> Tika v1.3 contains numerous upgrades since 0.8 (see [1]), some of which include improved
metadata generation for common files. These improved features are extremely useful for metadata
gathering.
> If a project using OODT needs features provided with the v1.3 tika-core or tika-app JAR
(e.g. custom met extractor), currently they cannot use this version when interacting with
OODT server-side components like filemgr, crawler etc. since it is incompatible with OODT's
use of v0.8.
> One of the incompatibilities is the deprecation of the 'getMimeType' method within org.apache.tika.mime.MimeTypes.getMimeType(URL).
This has been supplemented with Tika.detect(URL.getPath()) & MimeTypes.getRegisteredMimeType(String)
> See example exception thrown below. when crawler 0.6-SNAPSHOT was invoked while a 'tika-app-1.3.jar'
was placed in the crawler's lib directory:
> ---
> Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.crawl.ProductCrawler ingest
> INFO: ProductCrawler: Ready to ingest product: [/data/staging/IMG_2590.jpg]: ProductType:
[GenericFile]
> Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.ingest.StdIngester setFileManager
> INFO: StdIngester: connected to file manager: [http://localhost:9000]
> Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer
setFileManagerUrl
> INFO: In Place Data Transfer to: [http://localhost:9000] enabled
> Exception in thread "main" java.lang.NoSuchMethodError: org.apache.tika.mime.MimeTypes.getMimeType(Ljava/net/URL;)Lorg/apache/tika/mime/MimeType;
> at org.apache.oodt.cas.filemgr.structs.Reference.<init>(Reference.java:115)
> at org.apache.oodt.cas.filemgr.versioning.VersioningUtils.addRefsFromUris(VersioningUtils.java:251)
> at org.apache.oodt.cas.filemgr.ingest.StdIngester.ingest(StdIngester.java:189)
> at org.apache.oodt.cas.crawl.ProductCrawler.ingest(ProductCrawler.java:304)
> at org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:188)
> at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:108)
> at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)
> at org.apache.oodt.cas.crawl.daemon.CrawlDaemon.startCrawling(CrawlDaemon.java:82)
> at org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:55)
> at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
> at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)
> at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)
> ---
> This JIRA issue is seeks to document efforts to upgrade OODT's use of tika from 0.8 to
1.3. 
> ---
> [1] http://www.apache.org/dist/tika/CHANGES-1.3.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message