tika-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergiy Karpenko <sergey.karpe...@exoplatform.com>
Subject Problem with Tika configuration
Date Mon, 19 Jul 2010 13:27:06 GMT
Hello, friends

I want configure tika to use only PDFParser

So I make  tika-config.xml with exact content:

<parser name="parse-pdf" class="org.apache.tika.parser.pdf.PDFParser">
  <mime>application/pdf</mime>
</parser>

And I have test

      File file = getResourceAsFile("/test-documents/testPDF.pdf");
      TikaConfig myTC = new
TikaConfig(getResourceAsFile("/test-documents/tika-config.xml"));
      String s1 = ParseUtils.getStringContent(file, myTC);

It fails on last line
java.lang.NullPointerException
    at
org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:111)
    at
org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:170)
    at
org.apache.tika.utils.ParseUtils.getStringContent(ParseUtils.java:188)
    at org.apache.tika.TestParsers.testOwnPDFParser(TestParsers.java:60)

Debug shows that tika-config.xml contain incorrect configuration

Next one works fine:
<blabla>
<parser name="parse-pdf" class="org.apache.tika.parser.pdf.PDFParser">
  <mime>application/pdf</mime>
</parser>
</blabla>

Is there any documentation about Tika configuration, or at least a link to
correct and well formed tika-config.xml?

Thanks

Mime
View raw message