pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hannes Erven <han...@erven.at>
Subject Re: [SURVEY] PDFBox Uses Cases
Date Mon, 06 Jan 2014 18:18:12 GMT
Hi,


I'm using PDFbox in my client's project to:


- set crop/media boxes to automatically crop whitespace and/or unwanted 
content
(the actual cut points are calculated with ghostscript bbox and text 
extraction from "suspect" unwanted areas)

- extract individual pages from foreign documents

- add overlays to existing documents (like a stamp "COPY" on an invoice 
PDF, highlighting a particular area on a page, or "underlay" a page with 
another document [eg. 'business paper'])

- extract text from foreign documents (or parts of such documents) for 
full-text-search

- "convert" images to PDF documents (in that case, one image per page)



What I would like to do is to "optimize" a document in a way that 
removes everything that is not related to the currently "visible" 
(possibly cropped) area of the document, including metadata. I once 
asked about metadata removal on the mailing list (see 
http://mail-archives.apache.org/mod_mbox/pdfbox-dev/201307.mbox/%3C51DD62B8.3080401@lehmi.de%3E

) but since that is still "only" a nice-to-have for my project, I have 
yet to look further into how to "write back the [modified] PDmetadata 
stream" (and then supply a patch ;-] ) .


Anyways, for me PDFbox has always been a very valuable tool. This survey 
is a perfect occasion to say THANK YOU to the busy community!


Best regards,

	-hannes erven

Mime
View raw message