pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: is it possible to batch extract text from pdf files within a tree of folders within a zip file ?
Date Wed, 20 Apr 2016 20:20:21 GMT
Might want to look at Tika (which uses PDFBox) for that.

Let's say you have an <inputdir> that contains your zips.

java -jar tika-app.jar -J -t -i <inputdir> -o <outputdir>

See if that gets you close enough.

-----Original Message-----
From: davidgreen.co.uk@gmail.com [mailto:davidgreen.co.uk@gmail.com] On Behalf Of David Green
Sent: Wednesday, April 20, 2016 3:51 PM
To: users@pdfbox.apache.org
Subject: is it possible to batch extract text from pdf files within a tree of folders within
a zip file ?

. . . and save the text files in the same tree structure on another drive ?
this seems a big ask

-- 
Regards
David
Mime
View raw message