pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <viraf.bankwa...@yahoo.com.INVALID>
Subject Extracting layout information and text from searchable PDF
Date Mon, 27 Feb 2017 18:35:58 GMT
I have a number of searchable PDF documents from which I want to extract layout information
and text.  These documents are mixed in that some pages may be structured (e.g. forms) while
others may be unstructured free form text (e.g. letters, reports, etc).  
I was wondering if there were any projects that provided such capabilities.  I am familiar
with PdfTextExtractor and it would probably be a starting point if I was to build this functionality
- viraf

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message