From users-return-11541-archive-asf-public=cust-asf.ponee.io@pdfbox.apache.org Thu Mar 7 09:39:49 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id B4ECC180654 for ; Thu, 7 Mar 2019 10:39:48 +0100 (CET) Received: (qmail 9138 invoked by uid 500); 7 Mar 2019 09:39:47 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 9126 invoked by uid 99); 7 Mar 2019 09:39:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Mar 2019 09:39:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 912E8C22DD for ; Thu, 7 Mar 2019 09:39:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.799 X-Spam-Level: * X-Spam-Status: No, score=1.799 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id qgW1bYoeD_Ej for ; Thu, 7 Mar 2019 09:39:45 +0000 (UTC) Received: from mail-qt1-f179.google.com (mail-qt1-f179.google.com [209.85.160.179]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 8FBDD5F432 for ; Thu, 7 Mar 2019 09:39:44 +0000 (UTC) Received: by mail-qt1-f179.google.com with SMTP id u7so16290291qtg.9 for ; Thu, 07 Mar 2019 01:39:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=BjhpitV50aAjocTrvSHN8ue0fUWXOYT/ZrPgoL17tL8=; b=Ey15pwayFhFbm8k/9eHZoomhe3I8s6k4UAa61TxS7t7QWT1OfvDqHiZEtEiDdHbgGB 1cCn/flD8ir9IYnWbjuSMY7pOMGjaFQmk/g9eFWiHxU1WsTDbFAdy91Fa85faXy9KhaJ NtjoDL+b/36kKGUVFPb+Zn8uDd1dtOHL43T7y9YzKukVuLBtOUVQ60D3l49ipnnu1qxY Fk7FpUtierEBZrtJp1DOQnCcXFDNyDo/uRSu87yJSgG5dsR7tC7kmCFwNmWBCvDrwk+H YhIrhGvxpNDpKLgBYzQyRO1yTICfMxX4NPcEq1mt5vgcY1L1Ty09UmVdyfNohahEsRYF xQJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=BjhpitV50aAjocTrvSHN8ue0fUWXOYT/ZrPgoL17tL8=; b=nmJOCFQFl8FyAkpA9oohQY08jemy+kr98Q4iQSgySiuACg7cVtddGbrvUuB8P4SGnr CJrr4gKqetEW5AriyigF5KytbddHX2Ql+6yfN+VHCTSJ8U2OLby5RJLVXnlEwdUujyro AjBDrt0/IOh6xQbcude994zGJ3NW0hQAkKgZW+GhMpobc3l5/Tqk5Goh07zobpSNeQqf EdXoNTPAgL8ckhtqdDGQcrI9p4UekT2aHr4DdWFTcKzx+uXv1tPWPOFZri22Zwyi4x81 qdXxPPCxx1vveg4AK/4HM7JvJY0veBWfyTg3AzfzEh/SBdOPPrVvpcqlT79doZkP4b67 l/9g== X-Gm-Message-State: APjAAAU00ek9+NYXbbnDFx1Gt7fCHAWZe37UEy09xR9ErpTDLjHdZD9u 39uMQNHJbaxYEl4snoYq5WDUB29GTXyDm4n0xSLC/Lj7 X-Google-Smtp-Source: APXvYqx/k0SaCg+il+12PR/wGCFhdZLmZzj43QzGbhGpUFYtTIIGzN1DHKvGEZsM7vnT3kIfJ0tHvuZzXcSRHfF2BRw= X-Received: by 2002:ac8:3437:: with SMTP id u52mr9499992qtb.185.1551951583179; Thu, 07 Mar 2019 01:39:43 -0800 (PST) MIME-Version: 1.0 References: <06238d76-0146-0a24-ff63-15b92a8d3aa1@t-online.de> <1c089cc3-236b-1965-948b-132ad0e2beca@t-online.de> <000001d4d471$c8c16ee0$5a444ca0$@email.cz> <5dE.1RU0t.5MXqW5U1d6k.1SWE0z@seznam.cz> In-Reply-To: <5dE.1RU0t.5MXqW5U1d6k.1SWE0z@seznam.cz> From: European Neuroscience Center Date: Thu, 7 Mar 2019 11:39:06 +0200 Message-ID: Subject: Re: Extract embedded SVG image from PDF file To: users@pdfbox.apache.org Content-Type: multipart/alternative; boundary="000000000000d2f8d905837de217" --000000000000d2f8d905837de217 Content-Type: text/plain; charset="UTF-8" Hi Jan, The first thing I did was a Web Robot, that crawls all pages for each student and gets the necessary information. This significantly saves time, but again requires human interference and time. PDFs that are regularly sent automatically by email, for each student, contain all the necessary information, that the Web Robot collects. Do you think that through Selenium these activities and processes can be fully automated? Regards, Miro. On Thu, Mar 7, 2019 at 11:11 AM wrote: > > > We have access to the sources (Website), but this is time > > consuming. Partly, there are web services, which we can use, but not for > > all tasks. The PDF files are generated automatically by schedule, so > this > > way can be fully automated. > > > > > > Supposing your SVG data are available in some website and instead of > downloading them one by one you prefer extract them in bulk from PDF > snapshots of these pages, I'd recommend avoiding that PDF route and rather > automating that SVG downloading step. > > > Firstly I'd ask the app developers to provide some API to get data via web > service. Only if there is no other option, I would try guessing the SVG > image URL for any page/article. If there is some relation, automation is > easy. If not, you could somehow automate your manual steps via testing > tools, see e.g. https://www.seleniumhq.org/. > > > > > Jan --000000000000d2f8d905837de217--