pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: Sanitizing input?
Date Tue, 11 Aug 2015 17:52:45 GMT
Hi Roberto,

> Am 10.08.2015 um 02:22 schrieb Roberto Nibali <rnibali@gmail.com>:
> 
> Hi
> 
> Disclaimer: I have very limited knowledge of the PDF standard and only
> command the basics of PDFBox, however I have had my share of thrills with
> the IRS.
> 
> What's the final purpose of those filled out PDFs? Do you intend to be MeF (
> http://www.irs.gov/pub/irs-pdf/p4164.pdf) compliant? Are we talking about
> such PDFs (http://www.irs.gov/pub/irs-pdf/, which btw are/could be quite a
> test bed for PDFBox)?

I ran the PDFs through a document load/get AcroForm/close cycle - no issues. Just an initial
very basic test.

BR
Maruan 


> If MeF sounds intruiging to you, "simply" model and
> validate the input with the IRS' XSD for MeF and model your application
> around such a stable data governance.
> 
> Generally the IRS does extensive post-processing on the input documents, so
> I wouldn't bother too much. But depending on the kind of service you offer,
> you mileage will vary. Now, we had our share of fun with the IRS when
> filling out claims from "untrusted sources". If you provide a certified tax
> service, you might also need to adhere to processing standards set forth by
> the NIST, as in NIST SP-800-xx (53, for example), outlined for agencies in
> http://www.irs.gov/pub/irs-pdf/p1075.pdf.
> 
> If you're just doing it for some friends, apply the basic sanitizing
> aspects you figure out and go from there. Improve it over time, depending
> on the feedback of the IRS' process.
> 
> Best regards
> 
> Roberto
> 
> 
> On Sun, Aug 9, 2015 at 11:10 PM, Stuart Small <stuart.alan.small@gmail.com>
> wrote:
> 
>> I am putting together a system that automatically generates some tax forms
>> off of user input.  The original PDFs are provided by the IRS, I will just
>> be plugging user input into relevant fields.
>> 
>> PDF is a large file format that I don't fully understand.  I've been
>> surprised before by some of the things it is capable.  So that got me
>> thinking, is there any sanitation I need to perform to the user input
>> before generating the PDF?  Or any special cases I should keep in mind when
>> filling in forms with arbitrary strings from an untrusted source.
>> 
>> Thanks in advance!
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message