pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Suppressing layers on output
Date Sun, 19 Jun 2016 14:11:32 GMT
Am 19.06.2016 um 08:52 schrieb John Hewson:
>>> >>JIRA, and attach your code as a patch / diff.
>> >There is already some code handling those operators, see PDFMarkedContentExtractor.
It could be moved to a more generic place so that we have to add some filtering only.
> Yes, that's is the proper way to handle this. Operators are handled with a an OperatorProcessor,
not my modifying the parser (e.g. processStreamOperators). Better yet, we already have the
code to handle BMC/EMC. All that is needed is for PDFRenderer to add a constructor which accepts
a list of layer names to render, which are then passed as part of PageDrawerParmeters.

The problem is that these two operators influence whether or not all the 
other tokens in the content stream are used or not. So the method by C. 
makes sense to me.  The alternative would be to alter every operator 
processor to check whether it is relevant or not.
Or they would have to be extended from some common class that does this 

PDFMarkedContentExtractor is not really helpful. Here's some code to 
show what it does - it shows the objects that belong to a specific 
group. The output cannot be used for rendering.

import java.io.File;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.text.PDFMarkedContentExtractor;

public class ExtractMarkedContent extends PDFMarkedContentExtractor

     public ExtractMarkedContent() throws IOException

     public static void main(String[] args) throws IOException

        PDDocument doc = PDDocument.load(new File("C......\\PDFBox 
         PDOptionalContentProperties ocp = 
         System.out.println("Group names in document catalog: " + 
         for (String groupName : ocp.getGroupNames())
             PDOptionalContentGroup group = ocp.getGroup(groupName);
         ExtractMarkedContent extractMarkedContent = new 
         PDPage page = doc.getPage(0);
         System.out.println("Property names in page resources: " + 
         List<PDMarkedContent> markedContents = 
         System.out.println("Extracted contents: ");
         for (PDMarkedContent mc : markedContents)
             PDPropertyList propertyList = 
             String propName = 
             System.out.println(mc.getTag() + " (" + propName + "): " + 

The output is:

Group names in document catalog: [background, enabled, disabled]
(COSName{Name}:COSString{background}) }
(COSName{Name}:COSString{enabled}) }
(COSName{Name}:COSString{disabled}) }
Property names in page resources: [COSName{oc1}, COSName{oc2}, COSName{oc3}]
Extracted contents:
oc1 (background): [P, D, F,  , 1, ., 5, :,  , O, p, t, i, o, n, a, l,  , 
C, o, n, t, e, n, t,  , G, r, o, u, p, s, Y, o, u,  , s, h, o, u, l, d,  
, s, e, e,  , a,  , g, r, e, e, n,  , t, e, x, t, l, i, n, e, ,,  , b, 
u, t,  , n, o,  , r, e, d,  , t, e, x, t,  , l, i, n, e, .]
oc2 (enabled): [T, h, i, s,  , i, s,  , f, r, o, m,  , a, n,  , e, n, a, 
b, l, e, d,  , l, a, y, e, r, .,  , I, f,  , y, o, u,  , s, e, e,  , t, 
h, i, s, ,,  , t, h, a, t, ', s,  , g, o, o, d, .]
oc3 (disabled): [T, h, i, s,  , i, s,  , f, r, o, m,  , a,  , d, i, s, 
a, b, l, e, d,  , l, a, y, e, r, .,  , I, f,  , y, o, u,  , s, e, e,  , 
t, h, i, s, ,,  , t, h, a, t, ', s,  , N, O, T,  , g, o, o, d, !]

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message