pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From European Neuroscience Center <mnachev.nscenter...@gmail.com>
Subject Re: Extract embedded SVG image from PDF file
Date Thu, 07 Mar 2019 09:39:06 GMT
Hi Jan,

The first thing I did was a Web Robot, that crawls all pages for each
student and gets the necessary information. This significantly saves time,
but again requires human interference and time. PDFs that are regularly
sent automatically by email, for each student, contain all the necessary
information, that the Web Robot collects.

Do you think that through Selenium <https://www.seleniumhq.org/> these
activities and processes can be fully automated?


Regards,
Miro.

On Thu, Mar 7, 2019 at 11:11 AM <j.tosovsky@email.cz> wrote:

>
> > We have access to the sources (Website), but this is time
> > consuming. Partly, there are web services, which we can use, but not for
> > all tasks. The PDF files are generated automatically by schedule, so
> this
> > way can be fully automated.
>
>
>
>
>
> Supposing your SVG data are available in some website and instead of
> downloading them one by one you prefer extract them in bulk from PDF
> snapshots of these pages, I'd recommend avoiding that PDF route and rather
> automating that SVG downloading step.
>
>
> Firstly I'd ask the app developers to provide some API to get data via web
> service. Only if there is no other option, I would try guessing the SVG
> image URL for any page/article. If there is some relation, automation is
> easy. If not, you could somehow automate your manual steps via testing
> tools, see e.g. https://www.seleniumhq.org/.
>
>
>
>
> Jan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message