tika-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Problem with Tika configuration
Date Wed, 21 Jul 2010 05:44:18 GMT
Hi,

On Mon, Jul 19, 2010 at 4:27 PM, Sergiy Karpenko
<sergey.karpenko@exoplatform.com> wrote:
> I want configure tika to use only PDFParser

The easiest way to achieve this is to directly use the PDFParser class
instead of working through the configuration.

>       File file = getResourceAsFile("/test-documents/testPDF.pdf");
>       TikaConfig myTC = new
> TikaConfig(getResourceAsFile("/test-documents/tika-config.xml"));
>       String s1 = ParseUtils.getStringContent(file, myTC);

Use something like this instead:

    Parser parser = new PDFParser();
    ContentHandler handler = new BodyContentHandler();
    Metadata metadata = new Metadata();
    ParseContext context = new ParseContext();

    InputStream stream = TikaInputStream.get(new File("document.pdf"));
    try {
        parser.parse(stream, handler, metadata, context);
    } finally {
        stream.close();
    }

    String content = handler.toString();

BR,

Jukka Zitting

Mime
View raw message