pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: Unable to mark document as tagged
Date Fri, 13 Jun 2014 17:52:20 GMT
Colette,

you are not corrupting the PDF document but the structure Information needed for tagged PDF
is missing. 

Maruan Sahyoun

> Am 13.06.2014 um 19:41 schrieb Colette Joubarne <cjoubarne@privacyanalytics.ca>:
> 
> Maruan,
> 
> I use the parser to tokenize, and then loop thru the tokens. If a token is a TJ or Tj
operator, I grab the text, in certain cases I replace some of the text (letter by letter,
maintaining the existing structure), and add these tokens to a new token list. If it is not
a TJ or Tj operator I just copy the token to the new token list. I then write the token list
to the doc and save.
> 
> If I am corrupting the structure, how is it that the document displays correctly?
> 
> Colette
> 
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
> Sent: June-13-14 12:54 PM
> To: users@pdfbox.apache.org
> Subject: Re: Unable to mark document as tagged
> 
> Hi Colette,
> 
> the modified version does not contain the structure information needed for tagged PDFs.
 How do you create the modified version from the first one?
> 
> BR
> Maruan
> 
>> Am 13.06.2014 um 17:48 schrieb Colette Joubarne <cjoubarne@privacyanalytics.ca>:
>> 
>> Maruan,
>> 
>> I am copying the entire structure from a tagged document and just replacing some
of the text, so I would think that the structure is unchanged. Then again who knows what I
might have messed up.
>> 
>> James-pdf is the original file:
>> https://dl.dropboxusercontent.com/u/7689859/James.pdf
>> 
>> James-mod.pdf is the modified file:
>> https://dl.dropboxusercontent.com/u/7689859/James-mod.pdf
>> 
>> Colette
>> 
>> -----Original Message-----
>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
>> Sent: June-13-14 10:45 AM
>> To: users@pdfbox.apache.org
>> Subject: Re: Unable to mark document as tagged
>> 
>> Hi Colette,
>> 
>> this information alone doesn't make a document a tagged PDF! You might not have the
structure information needed within your PDF. Would you have a works / doesn't work sample
which you could upload to a public location as attachments are not allowed on the mailing
list?
>> 
>> BR
>> Maruan
>> 
>>> Am 13.06.2014 um 15:44 schrieb Colette Joubarne <cjoubarne@privacyanalytics.ca>:
>>> 
>>> Maruan,
>>> 
>>> Yes you are right, however why is it that when I look at the properties in Adobe
Reader it indicates that the document is not tagged?
>>> 
>>> 3 0 obj
>>> <<
>>> /Marked true
>>> 
>>> Colette
>>> -----Original Message-----
>>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
>>> Sent: June-13-14 9:19 AM
>>> To: users@pdfbox.apache.org
>>> Subject: Re: Unable to mark document as tagged
>>> 
>>> Dear Colette,
>>> 
>>> /MarkInfo 3 0 R indicates that the information you are looking for is referenced
and should be available in 3 0 obj. Could you verify that?
>>> 
>>> With kind regards
>>> 
>>> Maruan
>>> 
>>>> Am 13.06.2014 um 14:21 schrieb Colette Joubarne <cjoubarne@privacyanalytics.ca>:
>>>> 
>>>> I have a tagged pdf doc with the following header:
>>>> 
>>>>         /Type/Catalog/Pages 2 0 R/Lang(en-CA) /StructTreeRoot 10 0 R/MarkInfo<</Marked
true
>>>> 
>>>> I read in the contents, replace some of the text and create a new doc. I
copy the document information from the original doc and set marked to true.
>>>> 
>>>>         newDoc = new PDDocument();
>>>>         newDoc.setDocumentInformation(PTConstants.pdfDoc.getDocumentInformation());
>>>> 
>>>>         PDMarkInfo markinfo = new PDMarkInfo();
>>>>         markinfo.setMarked(true);
>>>>         newDoc.getDocumentCatalog().setMarkInfo(markinfo);
>>>> 
>>>> and when I check that it was set, it returns true:
>>>> 
>>>>   PDMarkInfo markInfo = PTConstants.pdfDoc.getDocumentCatalog().getMarkInfo();
>>>>   if ((markInfo != null) && (markInfo.isMarked())) System.out.println("true");
>>>> 
>>>> But, while the resulting document displays correctly, the header indicates
that it is not tagged:
>>>> 
>>>> /Type /Catalog
>>>> /Version /1.4
>>>> /Pages 2 0 R
>>>> /MarkInfo 3 0 R
>>>> 
>>>> Any idea what is going on?
>>>> 
>>>> Colette
> 

Mime
View raw message