pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "C. Alexander Leigh"...@a6v.org>
Subject Suppressing layers on output
Date Wed, 15 Jun 2016 17:10:31 GMT
I have need to suppress layers on the render, which came about because I am
working with GeoPDFs (maps) containing many layers. At any given time the
end-user is only going to be interested in viewing a subset of the layers.
I saw several people had asked this question in the past but I never saw an
implementation, so I thought I would post what I did in hopes it might
bring others good fortune.

Note: I do not claim this is a complete implementation - I am a novice at
the PDF specification, however, it did work for me with commercially
produced PDFs. Is it right? I have no idea. It works. I implemented
everything outside of pdfbox with the exception that I had to change
processStreamOperators() on PDFStreamEngine to public so that I could
override it.

All code herein is released by me, the author, into the public domain. Go
forth.

BDC and EMC operators need to be implemented. These will track which layer
the contained PDF bits are representing. It turns out these can stack and
not all BDCs will contain layer information - but they will have matching
EMCs, so you have to account for them.

=====

package org.a6v.pdf;

import org.apache.pdfbox.contentstream.operator.Operator;
import
org.apache.pdfbox.contentstream.operator.graphics.GraphicsOperatorProcessor;
import org.apache.pdfbox.cos.COSBase;
import org.apache.pdfbox.cos.COSName;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.util.List;

public class BDCOperatorProcessor extends GraphicsOperatorProcessor {
    private static final Logger logger =
LoggerFactory.getLogger(BDCOperatorProcessor.class);
    private final FilteringPageDrawer drawer;


    public BDCOperatorProcessor(FilteringPageDrawer drawer) {
        this.drawer = drawer;
    }

    /**
     * Process the operator.
     *
     * @param operator the operator to process
     * @param operands the operands to use when processing
     * @throws IOException if the operator cannot be processed
     */
    @Override
    public void process(Operator operator, List<COSBase> operands) throws
IOException {
        // Should be the name of the layer in MC syntax
        if (operands.size() < 2) {
            logger.debug("Operands list was short");
            drawer.getBdcStack().add(null);
            return;
        }

        COSBase name = operands.get(1);

        if (!(name instanceof COSName)) {
            logger.debug("Name is not a COSName: {}", name.getClass());
            drawer.getBdcStack().add(null);
            return;
        }

        String n = ((COSName) name).getName();

        if (n == null) {
            logger.debug("Key OC not found in dictionary");
            drawer.getBdcStack().add(null);
            return;
        }

        logger.debug("Determined layer name: {}", n);

        drawer.getBdcStack().add(n);
    }

    /**
     * Returns the name of this operator, e.g. "BI".
     */
    @Override
    public String getName() {
        return "BDC";
    }
}

===

package org.a6v.pdf;

import org.apache.pdfbox.contentstream.operator.Operator;
import
org.apache.pdfbox.contentstream.operator.graphics.GraphicsOperatorProcessor;
import org.apache.pdfbox.cos.COSBase;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.util.List;

public class EMCOperatorProcessor extends GraphicsOperatorProcessor {
    private static final Logger logger =
LoggerFactory.getLogger(EMCOperatorProcessor.class);
    private final FilteringPageDrawer drawer;


    public EMCOperatorProcessor(FilteringPageDrawer drawer) {
        logger.debug("Created");
        this.drawer = drawer;
    }

    /**
     * Process the operator.
     *
     * @param operator the operator to process
     * @param operands the operands to use when processing
     * @throws IOException if the operator cannot be processed
     */
    @Override
    public void process(Operator operator, List<COSBase> operands) throws
IOException {
        logger.debug("called: {} {}", operator, operands);

        List<String> stack = drawer.getBdcStack();

        stack.remove(stack.size() - 1);

        logger.debug("Current content stack: {}", stack);
    }

    /**
     * Returns the name of this operator, e.g. "BI".
     */
    @Override
    public String getName() {
        return "EMC";
    }
}

===

Once this is done, only two classes need to be extended, the PDFRenderer
and the PageDrawer.

===

package org.a6v.pdf;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.apache.pdfbox.rendering.PageDrawer;
import org.apache.pdfbox.rendering.PageDrawerParameters;

import java.io.IOException;
import java.util.HashSet;

/**
 * This implementation of <code>PDFRenderer</code> is capable of
suppressing layers on the render.
 *
 * @author C. Alexander Leigh
 */
public class FilteringRenderer extends PDFRenderer {
    private final HashSet<String> hiddenList;

    /**
     * Creates a new PDFRenderer.
     *
     * @param document the document to render
     */
    public FilteringRenderer(PDDocument document, HashSet<String>
hiddenList) {
        super(document);
        this.hiddenList = hiddenList;
    }

    protected PageDrawer createPageDrawer(PageDrawerParameters parameters)
throws IOException {
        return new FilteringPageDrawer(parameters, hiddenList);
    }
}

===

The page drawer implementation is the real meat of it. Here we have to
track whether or not we are rendering the current
layer - note that we will suppress children if a parent is being hidden.
When we are suppressing we simply don't process
the content on the stream, unless it happens to be a BDC or a EMC. Once we
are not suppressing anymore everything goes
back to normal.

===

package org.a6v.pdf;

import org.apache.pdfbox.contentstream.PDContentStream;
import org.apache.pdfbox.contentstream.operator.Operator;
import org.apache.pdfbox.cos.COSBase;
import org.apache.pdfbox.cos.COSObject;
import org.apache.pdfbox.pdfparser.PDFStreamParser;
import org.apache.pdfbox.rendering.PageDrawer;
import org.apache.pdfbox.rendering.PageDrawerParameters;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;

public class FilteringPageDrawer extends PageDrawer {
    private static final Logger logger =
LoggerFactory.getLogger(FilteringPageDrawer.class);
    private final ArrayList<String> bdcStack = new ArrayList<>();
    private final HashSet<String> hiddenList;

    /**
     * Constructor.
     *
     * @param parameters Parameters for page drawing.
     * @throws IOException If there is an error loading properties from the
file.
     */
    public FilteringPageDrawer(PageDrawerParameters parameters,
HashSet<String> hiddenList) throws IOException {
        super(parameters);
        this.hiddenList = hiddenList;
        addOperator(new EMCOperatorProcessor(this));
        addOperator(new BDCOperatorProcessor(this));

        logger.info("Created: {}", parameters);
    }

    /**
     * Returns <code>true</code> if the renderer is currently rendering,
otherwise, returns <code>false</code>.
     * If <code>false</code> is returned, then anything in the PDF stream
should not be rendered to the output.
     *
     * Rendering will be suppressed if any of the optional content tags
currently opened are also contained within
     * the suppression list. This side-effects suppresing children of a
given optional content.
     */
    public boolean isRendering() {
        for (String idx : bdcStack) {
            if (hiddenList.contains(idx)) {
                logger.debug("Filtering...");
                return false;
            }
        }

        return true;
    }

    /**
     * Returns the current BDC stack for this drawer. Each time a BDC is
processed, the name of the content
     * is pushed onto this stack. When an EMC is processed, the
corresponding entry is pulled off. This is
     * coordinated by the <code>BDCOperatorProcessor</code> and
<code>EMCOperatorProcessor</code> classes.
     *
     * @return
     */
    public List<String> getBdcStack() {
        return bdcStack;
    }

    /**
     * Processes the operators of the given content stream.
     *
     * @param contentStream to content stream to parse.
     * @throws IOException if there is an error reading or parsing the
content stream.
     */
    public void processStreamOperators(PDContentStream contentStream)
throws IOException {
        logger.debug("Called");

        List<COSBase> arguments = new ArrayList<COSBase>();
        PDFStreamParser parser = new PDFStreamParser(contentStream);
        Object token = parser.parseNextToken();
        while (token != null) {

            if (isRendering()) {
                if (token instanceof COSObject) {
                    arguments.add(((COSObject) token).getObject());
                } else if (token instanceof Operator) {
                    processOperator((Operator) token, arguments);
                    arguments = new ArrayList<>();
                } else {
                    arguments.add((COSBase) token);
                }
            } else {
                // If we are not currently rendering, we only process EMC
and BDC
                if (token instanceof Operator) {
                    String tokenName = ((Operator) token).getName();
                    if (tokenName.equals("BDC") || tokenName.equals("EMC"))
{
                        processOperator((Operator) token, arguments);
                        arguments = new ArrayList<>();
                    }
                }
            }

            token = parser.parseNextToken();
        }
    }
}

===

Note that the hidden list and most of this code works on the IDs for the
layers, not their names. I actually have no idea where the ID comes from -
whether PDFBox comes up with them, or whether they are in the PDF.
Regardless, you have to build a translation of the layer description to the
id for any of this to be useful. The IDs for my example ran MC0, MC1, MCn.
Note that the index order is not the same as the order the optional content
groups enumerate - I tried that first, ha ha. You actually have to build
the index. If yours happen to line up it is just a coincidence.

The layers can be enumerated like this:

        PDPage zero = doc.getPage(idx);
        PDResources res = zero.getResources();

        for (COSName propName : res.getPropertiesNames()) {
            // Fragile - can other kinds be found here?
            PDOptionalContentGroup mc = (PDOptionalContentGroup)
res.getProperties(propName);
            logger.info("Prop: {} {}", propName, mc.getName());
        }

So for example if you see that you want to hide MC2, put those in a HashSet
and pass them into a new FilteringRenderer. That looks something like this:

        HashSet<String> hidden = new HashSet<>();
        // Suppress the ortho
        hidden.add("MC2");

        FilteringRenderer renderer = new FilteringRenderer(doc, hidden);
        BufferedImage map = renderer.renderImageWithDPI(0, dpi);

I sincerely hope this helps someone facing this same challenge.

Thanks for listening!

-- 
C. Alexander Leigh

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message