pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roberto Nibali <rnib...@gmail.com>
Subject Re: PDFBox for JavaScript analysis
Date Mon, 18 Jan 2016 22:19:42 GMT
Hi

One of my PDF library methods reads the following method (stripped to
remove references to other internal library calls not relevant to your
question):

private void executeDumpJS(String srcDocName) throws IOException {
    PDDocument srcDoc = null;
    try {
        srcDoc = PDDocument.load(new File(srcDocName));
        srcDoc.getDocumentCatalog().getAcroForm().getFields().forEach(this::dumpJSEntry);
        srcDoc.close();
    } catch (Exception e) {
        // do something
    } finally {
        if (srcDoc != null) {
            srcDoc.close();
        }
    }
}

The input string is the PDF file. The dumpJSEntry() method looks as follows:

private void dumpJSEntry(PDField srcField) {
    if (srcField instanceof PDNonTerminalField) {
        ((PDNonTerminalField)
srcField).getChildren().forEach(this::dumpJSEntry);
    } else if (!(srcField instanceof PDSignatureField)) {
        dumpJavaScriptEntries(srcField);
    }
}

This then calls dumpJavaScriptEntries() for all non-PDFNonTerminalFields,
which finally dumps the javascript portions of your PDF (courtesy of Tilman
Hausherr):

private void dumpJavaScriptEntries(PDField field) {
    final String fqName = field.getFullyQualifiedName();

    final PDFormFieldAdditionalActions fieldActions = field.getActions();
    if (fieldActions != null) {
        final StringBuilder sb = new StringBuilder();
        final Formatter formatter = new Formatter(sb, Locale.ENGLISH);
        formatter.format("// %s [%s]:%n", fqName,
fieldActions.getClass().getSimpleName());
        System.out.printf("%s", sb.toString());

        /**
         * This will dump a JavaScript action to be performed when the user
         * types a keystroke into a text field or combo box or modifies the
         * selection in a scrollable list box. This allows the keystroke to
         * be checked for validity and rejected or modified.
         */
        printPossibleJS(fieldActions.getK());
        /**
         * This will dump a JavaScript action to be performed in order
         * to recalculate the value of this field when that of another
         * field changes.
         */
        printPossibleJS(fieldActions.getC());
        /**
         * This will dump a JavaScript action to be performed before
         * the field is formatted to display its current value. This
         * allows the field's value to be modified before formatting.
         */
        printPossibleJS(fieldActions.getF());
        /**
         * This will dump a JavaScript action to be performed
         * when the field's value is changed. This allows the
         * new value to be checked for validity.
         */
        printPossibleJS(fieldActions.getV());
    }

    final PDTerminalField termField = (PDTerminalField) field;
    for (PDAnnotationWidget widgetAction : termField.getWidgets()) {
        final PDAction action = widgetAction.getAction();
        if (action instanceof PDActionJavaScript) {
            final StringBuilder sb = new StringBuilder();
            final Formatter formatter = new Formatter(sb, Locale.ENGLISH);
            formatter.format("// %s [%s]:%n", fqName,
action.getClass().getSimpleName());
            System.out.printf("%s", sb.toString());
            printPossibleJS(action);
        }
    }
}

Now, only one last piece is missing, printPossibleJS(), which again
originates from some code written by Tilman Hausherr:

private void printPossibleJS(PDAction kAction) {
    if (kAction instanceof PDActionJavaScript) {
        final PDActionJavaScript jsAction = (PDActionJavaScript) kAction;
        String jsString = jsAction.getAction();
        if (!jsString.contains("\n")) {
            jsString = jsString.replaceAll("\r", "\n").replaceAll("\n\n", "\n");
        }
        System.out.println(jsString);
        System.out.println();
    }
}

Couldn't find a simpler way to do this, since a PDF basically is a directed
graph of objects. Pick out the pieces you need.

Hope it helps.

Cheers

Roberto



On Mon, Jan 18, 2016 at 6:43 AM, Alin Ghitulan <alinghitulan@gmail.com>
wrote:

> Hello,
>
> Can anyone help me accomplish this? I need some direction on how to obtain
> a list of objects in PDF that contains JavaScript code so I can further
> process the JS code.
>
> Thanks,
> Alin
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message