pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Green <da...@davidgreen.co.uk>
Subject Re: is it possible to batch extract text from pdf files within a tree of folders within a zip file ?
Date Sun, 01 May 2016 01:06:55 GMT
sorry for using wrong forum
is there a tika forum ?

your suggested command is working of a fashion
java -jar c:\jars\tika-app-1.12.jar -J -t -i f: -o g:
the directory structure is being reproduced but the zip files are being
copied as zip files (I think)
the copied files retain the original filename (including the original zip
extension) with an additional json extension
though when I try to open the file using B1 file archiver, it reports a
corrupt file.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message