pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alexander.kriegi...@extern.sdv-it.de
Subject Antwort: Re: How to merge PDF/A-1b documents and keep conformity
Date Fri, 15 Apr 2016 14:30:52 GMT
If you mean saving to a temp-file, re-reading and manipulating it, writing 
it again, this is not an option because performance is very important for 
us. As you can see from my code snippet, I am already using piped streams 
to avoid disk I/O. But anyway, Maruan, what is your suggestion?




Von:    Maruan Sahyoun <sahyoun@fileaffairs.de>
An:     users@pdfbox.apache.org, 
Datum:  15.04.2016 12:54
Betreff:        Re: How to merge PDF/A-1b documents and keep conformity



Hi,

> Am 15.04.2016 um 12:35 schrieb alexander.kriegisch@extern.sdv-it.de:
> 
> Basically your hack works if I overwrite PDFMergerUtility (extending it 
is 
> no option even in the same package because 'appendDocument()' needs 
> private members). I had to modify your snippet by this in order to avoid 

> adding multiple intents, leading to a validation error:
> 
>  private boolean hasIntent = false;
>  ...
>  public void appendDocument(PDDocument destination, PDDocument source) 
> throws IOException
>  {
>    ...
>    if (!hasIntent) {
>      hasIntent = true;
>      List<PDOutputIntent> srcOutputIntents =
>        srcCatalog.getOutputIntents();
>     for (PDOutputIntent outputIntent : srcOutputIntents)
>        destCatalog.addOutputIntent(outputIntent);
>    }
>    ...
>  }
> 
> It would be really nice if I could either tell the merger to set a given 

> output intent or to copy the first one as shown above. How do I achieve 
> this without duplicating your original code? An additional parameter for 

> setting the desired PDF/A standard type or at least one for setting the 
> top level output intent to the PDFMergerUtility constructor or to 
> mergeDocuments() would be really nice.

would it be an option to do the merge first and remove the output intent 
that is needed/you'd like to keep on the merged document afterwards?
BR
Maruan

> 
> 
> 
> Von:    alexander.kriegisch@extern.sdv-it.de
> An:     users@pdfbox.apache.org, 
> Datum:  15.04.2016 11:11
> Betreff:        Antwort: Re: How to merge PDF/A-1b documents and keep 
> conformity
> 
> 
> 
> Hi Tilman.
> 
> What exactly do you need to know except for what I already told you in 
the 
> 
> "situation" paragraph? We currently use something like this:
> 
> public InputStream merge(final List<InputStream> sources) throws 
> IOException {
>  PDFMergerUtility merger = new PDFMergerUtility();
>  for (InputStream source : sources) {
>    logger.trace("PDF merger source = {}", source);
>    merger.addSource(source);
>  }
>  PipedOutputStream outputStream = new PipedOutputStream();
>  PipedInputStream inputStream = new PipedInputStream(outputStream);
>  merger.setDestinationStream(outputStream);
>  new Thread(() -> {
>    try {
>      merger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
>    } catch (IOException e) {
>      logger.error("PDF merge problem", e);
>    }
>  }).start();
>  return inputStream;
> }
> 
> Does that help? By the way, I need an automated, stable PDF merge 
> solution, not a one-time hack including manual editing in Notepad++. 
> Furthermore, I cannot just add code to your API, I would like to use the 

> API as is. I tried to quick & dirty extend PDFMergerUtility with a 
> subclass and overwrite 'appendDocument', copying all the original source 

> code. But the thing is, that methods uses non-public classes like 
> PDFCloneUtility and non-public members etc. I could only try to use the 
> same package as the original, but this is not nice.
> 
> The source documents are, as I said, PDF/A-1b compliant, all of them 
> created by the same output manegement system. So I guess the output 
> intents (whatever that means) are similar or identical.
> 
> Regards
> --
> Alexander Kriegisch
> 
> 
> 
> 
> Von:    Tilman Hausherr <THausherr@t-online.de>
> An:     users@pdfbox.apache.org, 
> Datum:  13.04.2016 18:20
> Betreff:        Re: How to merge PDF/A-1b documents and keep conformity
> 
> 
> 
> Am 13.04.2016 um 12:03 schrieb alexander.kriegisch@extern.sdv-it.de:
>> Hi, I am new to this list.
>> 
>> My profile is: experienced Java programmer, knowing how to use
>> PDFMergerUtility, not not a PDF or even PDF/A-1b expert.
>> 
>> Situation: We have a bunch of PDF/A-1b compliant documents from a 3rd
>> party system and merge them into a new document. The end result is not
>> PDF/A-1b compliant though.
>> 
>> I found this on the mailing list archive:
>> 
> 
http://pdfbox-users.markmail.org/search/?q=merge%20pdf%2Fa#query:merge%20pdf%2Fa+page:1+mid:uwvybz6lhgof3agg+state:results

> 
> 
>> Is there a better answer today than to look into PDFMergerUtility 
> sources?
>> Because this class is what we are using, but it does not do it, at 
least
>> not in version 1.8.9. Is there a reason to assume that this has changed 

> in
>> 2.x?
>> 
> You didn't mention what went wrong. I had that problem once with 2 files 

> from the same source, what I did is:
> 
> 1) in 2.0 source code (I won't bother with 1.8) add this in 
> PDFMergerUtility.appendDocument() above the comment "merge logical 
> structure hierarchy":
> 
>         List<PDOutputIntent> srcOutputIntents = 
> srcCatalog.getOutputIntents();
>         for (PDOutputIntent outputIntent : srcOutputIntents)
>         {
>             destCatalog.addOutputIntent(outputIntent);
>         }
> 
> then I edited the result PDF manually to remove one of the output 
> intents. The result PDF should have something like this:
> 
> /OutputIntents [7 0 R 8 0 R]
> 
> just blank one of the two, e.g. like this:
> 
> /OutputIntents [7 0 R      ]
> 
> make sure that you don't change any positions, i.e. switch your editor 
> (NOTEPAD++) to overwrite.
> 
> This may or may not work... if the two files have different output 
> intents, then you'll have surprises, obviously.
> 
> I haven't done any code changes... I don't know for sure what element of 

> the outputIntent is the "key" (so to skip others with the same key), and 

> don't know what I should do if files have different ones. I suspect it 
> is "OutputConditionIdentifier".
> 
> 
> Example of an outputIntent:
> 
> <<
> /Type/OutputIntent
> /S/GTS_PDFA1
> /OutputCondition(U.S. Web Coated \(SWOP\) v2)
> /OutputConditionIdentifier(CGATS TR 001)
> /Info(U.S. Web Coated \(SWOP\) v2)
> /DestOutputProfile 4 0 R
>>> 
> 
> 4 0 obj
> 
> <<
> /N 4
> /Filter/FlateDecode
> /Length 389758
>>> 
> stream
> ...
> endstream
> 
> endobj
> 
> 
> If you tell more what you're trying to do (one time only problem or 
> not?), maybe I can help...
> 
> Tilman
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message