pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Williams <evan.willi...@zapprx.com>
Subject Re: Need Help With A Problematic PDF
Date Mon, 06 Mar 2017 18:00:28 GMT
Hi Tilman,

Unfortunately I am most definitely not the creator of the PDF. I get the
forms from the drug manufacturers and pharmacies that produce them. And,
unfortunately, I am legally constrained for at least some of these forms to
use their exact PDF or an exact perfect recreation of it). The quality of
these forms is extremely variable.

My job is to make the best possible tools to take prescriptions in a
predictable, regular electronic record and use it to fill out these
specific forms, of which I have dozens, trending to hundreds.

That is the 'interesting' aspect of my job.

So unfortunately, improving the PDF by recreating it is not a viable
option. And even if it were, I could have to replicate that an unbounded
number of times for every 'bad' PDF in my library.

I am trying some things and I will tell you what I come up with.

Thank you so much for investing so much time in my problem. I truly
appreciate it.

On Mon, Mar 6, 2017 at 12:30 PM, Tilman Hausherr <THausherr@t-online.de>

> Am 06.03.2017 um 18:09 schrieb Evan Williams:
>> Hi Tilman!
>> So, it develops that the fax service has an upper limit of 4 Mb per
>> individual uploaded file. So I will probably try to implement your
>> PDFSplit
>> solution (I tried compressing the file but was unable to get it under 4
>> Mb).
>> I am concerned about memory and processing time for the PDFSplit but the
>> fax service seems unwilling to contemplate fixing this on their end (their
>> actual render works fine with the form, it is just their API layer that
>> imposes the limit).
> Thanks for the news.... I had another look at your PDF. It is not very
> efficient... for example, the same colorspace is used several times; the
> same font (but different subsets) is in each page. Instead of only one
> entry for each font. The company logos are not in an XObject form, so they
> are repeated for each page. If you're the creator of the file, maybe you
> can do something there...
> Tilman
>> On Fri, Mar 3, 2017 at 12:35 PM, Tilman Hausherr <THausherr@t-online.de>
>> wrote:
>> Am 03.03.2017 um 14:31 schrieb Evan Williams:
>>> The first sign of trouble is, various PDF renderers in browsers were
>>>> displaying fields that had been filled in as blank (intermittently).
>>>> Chrome's PDF viewer was one such culprit. Need Appearances was set to
>>>> true
>>>> with this form, so I set it to False and refreshed the appearances,
>>>> which
>>>> may or may not have fixed it (if it doesn't I will do a full flatten on
>>>> it).
>>>> More seriously, the faxing service that we use chokes on this form. They
>>>> are unable to render it even if I don't fill it out and send them the
>>>> original PDF. They are investigating this on their end, but since this
>>>> is
>>>> the only PDF that has ever had this issue and because of its previous
>>>> suspicious behavior, I believe that it might be corrupt in some subtle
>>>> way
>>>> that I don't understand. PDFBox seems to have no trouble working with
>>>> it,
>>>> and it is viewable and printable in Acrobat and in third party viewers.
>>>> But
>>>> I am uncertain.
>>>> It would be useful to have a version of this PDF with entries and then
>>> see
>>> what happens with Chrome. And then pass that one there.
>>> Re that fax service, try PDFSplit to split it in single pages, and then
>>> try with every single page. That should narrow the problem.
>>> PDF is a complex format... there are many (including us) that don't
>>> implement all. Normally this shouldn't lead to a crash.
>>> Tilman
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org

*Evan Williams*
Sr. Software Engineer

*www.ZappRx.com <http://www.zapprx.com/>*

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message