pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davide Zoni <Davide.Z...@Cedacri.it>
Subject RE: Check for scripts in a PDF
Date Mon, 29 Aug 2016 11:16:19 GMT
Hallo,

yuo can find something a multimedia example here (i'm aware that the code suggested by Tilman
might not work here) :

http://media.washingtonpost.com/wp-adv/advertisers/Adobe/Obama/090808/ObamaPort.pdf

or here (the first one):

http://www.pdfscripting.com/public/Free-Sample-PDF-Files-with-scripts.cfm

where PDAcroForm is not null but the code fails to check for javascript fields.

Thanks.

        Davide Zoni

        Cedacri S.p.A.

        Tel.: 0521807433

        e-mail: davide.zoni@cedacri.it

        www.cedacri.it


________________________________________
Da: Maruan Sahyoun [sahyoun@fileaffairs.de]
Inviato: lunedì 29 agosto 2016 11.28
A: users@pdfbox.apache.org
Oggetto: Re: Check for scripts in a PDF

Hi,

> Am 29.08.2016 um 11:09 schrieb Davide Zoni <Davide.Zoni@Cedacri.it>:
>
> Hi everybody again,
>
> i'm trying to figure out if your method is suitable for my necessities but everytime
i try to access the acroForm (even in a pdf file with scripts and forms) it's null.

could you upload a  sample PDF to a public site to take a look at? An interactive PDF form
should have an AcroForm entry.

BR
Maruan


> Am i loading the file in a wrong way? Am i missing something?
>
> Best regards.
>
> ________________________________________
> Da: Tilman Hausherr [THausherr@t-online.de]
> Inviato: mercoledì 24 agosto 2016 18.24
> A: users@pdfbox.apache.org
> Oggetto: Re: Check for scripts in a PDF
>
> Am 24.08.2016 um 15:41 schrieb Davide Zoni:
>> Thank you. This might be helpful but i'm afraid that i would not be able to check
every possibility. There's a way to check if a PDF is static (or dynamic)? For our pourpose
that shuold be enough.
>
> No there is no such method.
>
> Tilman
>
>
>> Best regards.
>>
>>         Davide Zoni
>>
>>         Cedacri S.p.A.
>>
>>         Tel.: 0521807433
>>
>>         e-mail: davide.zoni@cedacri.it
>>
>>         www.cedacri.it
>>
>>
>> ________________________________________
>> Da: Tilman Hausherr [THausherr@t-online.de]
>> Inviato: martedì 23 agosto 2016 18.23
>> A: users@pdfbox.apache.org
>> Oggetto: Re: Check for scripts in a PDF
>>
>> Am 23.08.2016 um 09:35 schrieb Davide Zoni:
>>> Yes, i'm seeking to detect files with scripts. Not static. I don't undestand
what do you mean with "Maybe compare
>>> with the preflight source code to check that you didn't miss something", can
you elaborate on that?
>> I meant to search for "Javascript" in the source code, and then see
>> where it is used. This is just so that you can be more sure what you got
>> all when you read the PDF specification.
>>
>> Btw I once wrote some code to show (some) javascript fields, see below
>> or search for "Roberto Nibali Javascript". He also improved that code
>> and posted the improved version. It may not find all javascript stuff,
>> but it could help show you how to write code.
>>
>> Tilman
>>
>>
>> public class PrintJavaScriptFields
>> {
>>
>>      /**
>>       * This will print all the fields from the document.
>>       *
>>       * @param pdfDocument The PDF to get the fields from.
>>       *
>>       * @throws IOException If there is an error getting the fields.
>>       */
>>      public void printFields(PDDocument pdfDocument) throws IOException
>>      {
>>          PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
>>          PDAcroForm acroForm = docCatalog.getAcroForm();
>>          List<PDField> fields = acroForm.getFields();
>>
>>          //System.out.println(fields.size() + " top-level fields were
>> found on the form");
>>
>>          for (PDField field : fields)
>>          {
>>              processField(field, "|--", field.getPartialName());
>>          }
>>      }
>>
>>      private void processField(PDField field, String sLevel, String
>> sParent) throws IOException
>>      {
>>          String partialName = field.getPartialName();
>>
>>          if (field instanceof PDTerminalField)
>>          {
>>              PDTerminalField termField = (PDTerminalField) field;
>>              for (PDAnnotationWidget widget : termField.getWidgets())
>>              {
>>                  PDAction action = widget.getAction();
>>                  if (action instanceof PDActionJavaScript)
>>                  {
>>                      System.out.println(field.getFullyQualifiedName() +
>> ": " + action.getClass().getSimpleName() + " js widget action:\n" +
>> action.getCOSObject());
>>                      printPossibleJS(action);
>>                  }
>>                  PDAnnotationAdditionalActions actions =
>> widget.getActions();
>>                  if (actions != null)
>>                  {
>>                      System.out.println(field.getFullyQualifiedName() +
>> ": " + actions.getClass().getSimpleName() + " js widget actionS:\n" +
>> actions.getCOSObject());
>>
>>                      // Merkwürdig, wieso bekomme ich nicht
>> PDFormFieldAdditionalActions sondern ein PDAnnotationAdditionalActions
>> in dem ein K ist aber kein getK() ?
>>                      PDFormFieldAdditionalActions ffActions = new
>> PDFormFieldAdditionalActions((COSDictionary) actions.getCOSObject());
>>                      printPossibleJS(ffActions.getK());
>>                      printPossibleJS(ffActions.getC());
>>                      printPossibleJS(ffActions.getF());
>>                      printPossibleJS(ffActions.getV());
>>                  }
>>              }
>>          }
>>
>>          if (field instanceof PDNonTerminalField)
>>          {
>>              if (!sParent.equals(field.getPartialName()))
>>              {
>>                  if (partialName != null)
>>                  {
>>                      sParent = sParent + "." + partialName;
>>                  }
>>              }
>>              //System.out.println(sLevel + sParent);
>>
>>              for (PDField child : ((PDNonTerminalField)
>> field).getChildren())
>>              {
>>                  processField(child, "|  " + sLevel, sParent);
>>              }
>>          }
>>          else
>>          {
>>              String fieldValue = field.getValueAsString();
>>              StringBuilder outputString = new StringBuilder(sLevel);
>>              outputString.append(sParent);
>>              if (partialName != null)
>>              {
>>                  outputString.append(".").append(partialName);
>>              }
>>              outputString.append(" = ").append(fieldValue);
>>              outputString.append(",
>> type=").append(field.getClass().getName());
>>              //System.out.println(outputString);
>>          }
>>      }
>>
>>      private void printPossibleJS(PDAction kAction)
>>      {
>>          if (kAction instanceof PDActionJavaScript)
>>          {
>>              PDActionJavaScript jsAction = (PDActionJavaScript) kAction;
>>              String jsString = jsAction.getAction();
>>              if (!jsString.contains("\n"))
>>              {
>>                  // Sonst erscheint in Netbeans nichts?!
>>                  jsString = jsString.replaceAll("\r",
>> "\n").replaceAll("\n\n", "\n");
>>              }
>>              System.out.println(jsString);
>>              System.out.println();
>>          }
>>      }
>>
>>      /**
>>       * This will read a PDF file and print out the form elements. <br />
>>       * see usage() for commandline
>>       *
>>       * @param args command line arguments
>>       *
>>       * @throws IOException If there is an error importing the FDF document.
>>       */
>>      public static void main(String[] args) throws IOException
>>      {
>>          PDDocument pdf = null;
>>          try
>>          {
>>              pdf = PDDocument.load(new File(XXXXXX));
>>              PrintJavaScriptFields exporter = new PrintJavaScriptFields();
>>              exporter.printFields(pdf);
>>          }
>>          finally
>>          {
>>              if (pdf != null)
>>              {
>>                  pdf.close();
>>              }
>>          }
>>      }
>>
>> }
>>
>>
>>
>>> Thank you.
>>>
>>>          Davide
>>>
>>> ________________________________________
>>> Da: Tilman Hausherr [THausherr@t-online.de]
>>> Inviato: martedì 23 agosto 2016 8.34
>>> A: users@pdfbox.apache.org
>>> Oggetto: Re: Check for scripts in a PDF
>>>
>>> Am 22.08.2016 um 15:14 schrieb Davide Zoni:
>>>> Hallo everybody,
>>>>
>>>> i'm using PDFbox to check if a PDF file contains malicious scripts. I'm using
the PDF/A-1a validation to check the file. Since i'm searching only for potential damaging
code and not for a true PDF/A-1a standard accompliance, is it enough to consider 1.x.x, 6.x.x
and 7.x.x errors as "true" errors? Below category description:
>>>>
>>>> Category        Description
>>>> 1[.y[.z]]       Syntax Error
>>>> 2[.y[.z]]       Graphic Error
>>>> 3[.y[.z]]       Font Error
>>>> 4[.y[.z]]       Transparency Error
>>>> 5[.y[.z]]       Annotation Error
>>>> 6[.y[.z]]       Action Error
>>>> 7[.y[.z]]       Metadata Error
>>> Unclear what you're asking. Are you seeking to detect files with
>>> javascript? If so, I'd rather build something something from scratch,
>>> i.e. read the PDF specification and see where JS is used. Maybe compare
>>> with the preflight source code to check that you didn't miss something.
>>>
>>> Tilman
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>> Il contenuto e le informazioni di questo messaggio di posta elettronica sono
riservate, confidenziali e non vincolanti nè impegnative per Cedacri s.p.a., ne è vietata
pertanto la diffusione o divulgazione in qualunque modo eseguita. Qualora Lei non fosse la
persona a cui il presente messaggio è destinato La invitiamo ad eliminarlo e a non leggerlo,
dandocene gentilmente comunicazione. The content, informations and any attachments of this
e-mail are classified, confidential and not binding neither impegnative for Cedacri S.P.A.,
the spread or spreading in any executed way is prohibited therefore. If you are not named
recipient, please notify the sender immediately and do not disclose the contents to another
person, use it for any purpose, or store or copy the information in any medium.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message