pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colette Joubarne <cjouba...@privacyanalytics.ca>
Subject RE: Unable to mark document as tagged
Date Fri, 13 Jun 2014 17:41:02 GMT
Maruan,

I use the parser to tokenize, and then loop thru the tokens. If a token is a TJ or Tj operator,
I grab the text, in certain cases I replace some of the text (letter by letter, maintaining
the existing structure), and add these tokens to a new token list. If it is not a TJ or Tj
operator I just copy the token to the new token list. I then write the token list to the doc
and save.

If I am corrupting the structure, how is it that the document displays correctly?

Colette

-----Original Message-----
From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
Sent: June-13-14 12:54 PM
To: users@pdfbox.apache.org
Subject: Re: Unable to mark document as tagged

Hi Colette,

the modified version does not contain the structure information needed for tagged PDFs.  How
do you create the modified version from the first one?

BR
Maruan

Am 13.06.2014 um 17:48 schrieb Colette Joubarne <cjoubarne@privacyanalytics.ca>:

> Maruan,
> 
> I am copying the entire structure from a tagged document and just replacing some of the
text, so I would think that the structure is unchanged. Then again who knows what I might
have messed up.
> 
> James-pdf is the original file:
> https://dl.dropboxusercontent.com/u/7689859/James.pdf
> 
> James-mod.pdf is the modified file:
> https://dl.dropboxusercontent.com/u/7689859/James-mod.pdf
> 
> Colette
> 
> -----Original Message-----
> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
> Sent: June-13-14 10:45 AM
> To: users@pdfbox.apache.org
> Subject: Re: Unable to mark document as tagged
> 
> Hi Colette,
> 
> this information alone doesn't make a document a tagged PDF! You might not have the structure
information needed within your PDF. Would you have a works / doesn't work sample which you
could upload to a public location as attachments are not allowed on the mailing list?
> 
> BR
> Maruan
> 
> Am 13.06.2014 um 15:44 schrieb Colette Joubarne <cjoubarne@privacyanalytics.ca>:
> 
>> Maruan,
>> 
>> Yes you are right, however why is it that when I look at the properties in Adobe
Reader it indicates that the document is not tagged?
>> 
>> 3 0 obj
>> <<
>> /Marked true
>>>> 
>> 
>> Colette
>> -----Original Message-----
>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
>> Sent: June-13-14 9:19 AM
>> To: users@pdfbox.apache.org
>> Subject: Re: Unable to mark document as tagged
>> 
>> Dear Colette,
>> 
>> /MarkInfo 3 0 R indicates that the information you are looking for is referenced
and should be available in 3 0 obj. Could you verify that?
>> 
>> With kind regards
>> 
>> Maruan
>> 
>> Am 13.06.2014 um 14:21 schrieb Colette Joubarne <cjoubarne@privacyanalytics.ca>:
>> 
>>> I have a tagged pdf doc with the following header:
>>> 
>>>          /Type/Catalog/Pages 2 0 R/Lang(en-CA) /StructTreeRoot 10 0 R/MarkInfo<</Marked
true
>>> 
>>> I read in the contents, replace some of the text and create a new doc. I copy
the document information from the original doc and set marked to true.
>>> 
>>>          newDoc = new PDDocument();
>>>          newDoc.setDocumentInformation(PTConstants.pdfDoc.getDocumentInformation());
>>> 
>>>          PDMarkInfo markinfo = new PDMarkInfo();
>>>          markinfo.setMarked(true);
>>>          newDoc.getDocumentCatalog().setMarkInfo(markinfo);
>>> 
>>> and when I check that it was set, it returns true:
>>> 
>>>    PDMarkInfo markInfo = PTConstants.pdfDoc.getDocumentCatalog().getMarkInfo();
>>>    if ((markInfo != null) && (markInfo.isMarked())) System.out.println("true");
>>> 
>>> But, while the resulting document displays correctly, the header indicates that
it is not tagged:
>>> 
>>> /Type /Catalog
>>> /Version /1.4
>>> /Pages 2 0 R
>>> /MarkInfo 3 0 R
>>> 
>>> Any idea what is going on?
>>> 
>>> Colette
>> 
> 


Mime
View raw message