pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alexander.kriegi...@extern.sdv-it.de
Subject Re: How to merge PDF/A-1b documents and keep conformity
Date Fri, 15 Apr 2016 10:35:42 GMT
Basically your hack works if I overwrite PDFMergerUtility (extending it is 
no option even in the same package because 'appendDocument()' needs 
private members). I had to modify your snippet by this in order to avoid 
adding multiple intents, leading to a validation error:

  private boolean hasIntent = false;
  ...
  public void appendDocument(PDDocument destination, PDDocument source) 
throws IOException
  {
    ...
    if (!hasIntent) {
      hasIntent = true;
      List<PDOutputIntent> srcOutputIntents =
        srcCatalog.getOutputIntents();
     for (PDOutputIntent outputIntent : srcOutputIntents)
        destCatalog.addOutputIntent(outputIntent);
    }
    ...
  }

It would be really nice if I could either tell the merger to set a given 
output intent or to copy the first one as shown above. How do I achieve 
this without duplicating your original code? An additional parameter for 
setting the desired PDF/A standard type or at least one for setting the 
top level output intent to the PDFMergerUtility constructor or to 
mergeDocuments() would be really nice.



Von:    alexander.kriegisch@extern.sdv-it.de
An:     users@pdfbox.apache.org, 
Datum:  15.04.2016 11:11
Betreff:        Antwort: Re: How to merge PDF/A-1b documents and keep 
conformity



Hi Tilman.

What exactly do you need to know except for what I already told you in the 

"situation" paragraph? We currently use something like this:

public InputStream merge(final List<InputStream> sources) throws 
IOException {
  PDFMergerUtility merger = new PDFMergerUtility();
  for (InputStream source : sources) {
    logger.trace("PDF merger source = {}", source);
    merger.addSource(source);
  }
  PipedOutputStream outputStream = new PipedOutputStream();
  PipedInputStream inputStream = new PipedInputStream(outputStream);
  merger.setDestinationStream(outputStream);
  new Thread(() -> {
    try {
      merger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
    } catch (IOException e) {
      logger.error("PDF merge problem", e);
    }
  }).start();
  return inputStream;
}

Does that help? By the way, I need an automated, stable PDF merge 
solution, not a one-time hack including manual editing in Notepad++. 
Furthermore, I cannot just add code to your API, I would like to use the 
API as is. I tried to quick & dirty extend PDFMergerUtility with a 
subclass and overwrite 'appendDocument', copying all the original source 
code. But the thing is, that methods uses non-public classes like 
PDFCloneUtility and non-public members etc. I could only try to use the 
same package as the original, but this is not nice.

The source documents are, as I said, PDF/A-1b compliant, all of them 
created by the same output manegement system. So I guess the output 
intents (whatever that means) are similar or identical.

Regards
--
Alexander Kriegisch




Von:    Tilman Hausherr <THausherr@t-online.de>
An:     users@pdfbox.apache.org, 
Datum:  13.04.2016 18:20
Betreff:        Re: How to merge PDF/A-1b documents and keep conformity



Am 13.04.2016 um 12:03 schrieb alexander.kriegisch@extern.sdv-it.de:
> Hi, I am new to this list.
>
> My profile is: experienced Java programmer, knowing how to use
> PDFMergerUtility, not not a PDF or even PDF/A-1b expert.
>
> Situation: We have a bunch of PDF/A-1b compliant documents from a 3rd
> party system and merge them into a new document. The end result is not
> PDF/A-1b compliant though.
>
> I found this on the mailing list archive:
> 
http://pdfbox-users.markmail.org/search/?q=merge%20pdf%2Fa#query:merge%20pdf%2Fa+page:1+mid:uwvybz6lhgof3agg+state:results


> Is there a better answer today than to look into PDFMergerUtility 
sources?
> Because this class is what we are using, but it does not do it, at least
> not in version 1.8.9. Is there a reason to assume that this has changed 
in
> 2.x?
>
You didn't mention what went wrong. I had that problem once with 2 files 
from the same source, what I did is:

1) in 2.0 source code (I won't bother with 1.8) add this in 
PDFMergerUtility.appendDocument() above the comment "merge logical 
structure hierarchy":

         List<PDOutputIntent> srcOutputIntents = 
srcCatalog.getOutputIntents();
         for (PDOutputIntent outputIntent : srcOutputIntents)
         {
             destCatalog.addOutputIntent(outputIntent);
         }

then I edited the result PDF manually to remove one of the output 
intents. The result PDF should have something like this:

/OutputIntents [7 0 R 8 0 R]

just blank one of the two, e.g. like this:

/OutputIntents [7 0 R      ]

make sure that you don't change any positions, i.e. switch your editor 
(NOTEPAD++) to overwrite.

This may or may not work... if the two files have different output 
intents, then you'll have surprises, obviously.

I haven't done any code changes... I don't know for sure what element of 
the outputIntent is the "key" (so to skip others with the same key), and 
don't know what I should do if files have different ones. I suspect it 
is "OutputConditionIdentifier".


Example of an outputIntent:

<<
/Type/OutputIntent
/S/GTS_PDFA1
/OutputCondition(U.S. Web Coated \(SWOP\) v2)
/OutputConditionIdentifier(CGATS TR 001)
/Info(U.S. Web Coated \(SWOP\) v2)
/DestOutputProfile 4 0 R
 >>

4 0 obj

<<
/N 4
/Filter/FlateDecode
/Length 389758
 >>
stream
...
endstream

endobj


If you tell more what you're trying to do (one time only problem or 
not?), maybe I can help...

Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org






Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message