pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <j.tosov...@email.cz>
Subject Re: Extract embedded SVG image from PDF file
Date Thu, 07 Mar 2019 10:06:14 GMT

>The first thing I did was a Web Robot, that crawls all pages for each 
> student and gets the necessary information. This significantly saves time,

> but again requires human interference and time. PDFs that are regularly 
> sent automatically by email, for each student, contain all the necessary 
> information, that the Web Robot collects. 

This thread is becoming out of the topic. You should rather switch to 
StackOverflow and provide detailed description of the whole process there. 
If properly tagged, your case can attract more people with broader knowledge
in this particular topic than me.

Just a note, parsing data from PDF is always harder than from database or 
plain text formats (XML/JSON/CSV). If any engine can export data to PDF, it 
can potentially export same data also to formats better suited for bulk 


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message