pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Karpouzos <skarp...@hotmail.com>
Subject Working with Type 3 fonts
Date Thu, 20 Dec 2012 01:06:11 GMT

I have created an application that uses PdfBox to search through text looking for specific
words within pdf files.. When a match is found, I replace the word with a masked version.
All of the pdfs contain custom fonts that are embedded. My problem is that some of the fonts
are subsetted and are missing the characters of the masked version.
For Type 1 fonts, I have been able to load the pfb files and replace the font with an un-subsetted
version. I was just wondering what my options would be for pdfs that use type 3 fonts.
I was thinking of one of the following:
1) Create a type 1 font that looks like the type 3 font and load that. ISSUE: Not sure if
this is possible or how easy this would be.2a) Load a type 3 font from a file. ISSUE: I don't
see anything other than the Type3StreamParser, but it works off of a COSStream. In addition,
I don't have the original fonts, so I would need to extract the original font from a pdf and
save it to a file. Not sure if that is possible.2b) Load the dictionary of a template pdf
that contains the unsubsetted type 3 font, get the font object from the PDResources map and
add it to the pdf I am modifying. ISSUE: I tried this for the type 1 fonts originally and
found that it only worked for the first pdf. I'm assuming the font object would need to be
cloned or copied so that it does not become invalid.4) Modify the existing type 3 font and
add the extract characters right in the pdf. ISSUE: Again, not sure if something like this
is possible.
Any help would be greatly appreciated. 		 	   		  
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message