pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: String replace failed
Date Wed, 12 Jun 2013 11:52:02 GMT
Hi Markus,

what I meant was using the ReplaceString example as a base but looking at ExtractText for
how to get the different bits which make up a visible string together. As I already wrote
this is a lot of effort and there are several potential issues. One is related to the fact
the only parts of a font might be embedded (font subletting) and when you try to replace a
text string with another the glyph (character) you need is not available in the font. 

As an example let's say you needed 'ZF' to be printed in you PDF in let's say Frutiger. With
subsetting only the glyphs for 'Z' and 'F' will be available in your PDF. Now if you try to
replace that with 'AF' the glyph for 'A' will not be available in the embedded font …..
which would mean that you either need to get the information from the font if the font is
still available to you, add that to the form information (or create a new entry) ….. OR
represent with one of the inbuilt fonts which means that the character is a new obejtc. Now
when you try to extract the text it's no longer a consecutive string ….

So even if you put in all the effort you might end up with a solution which works in 90% of
your cases but not 100% . 

If you like you can contact me directly to discuss that further.

Maruan Sahyoun


Am 07.06.2013 um 14:36 schrieb markus.sticker.epos@zf.com:

> Hi,
> 
> sorry for the delay. (Vacation)
> this week I thried to merge the stripper with the TextReplace, but I didn't get it managed.
> Because the PDFTextStripper doesn't work in this way the StringReplace Sample do.
> Maybe you could be so kind to give me an other hint.
> 
> Best regards
> 
> Markus
> 
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
> Gesendet: Freitag, 17. Mai 2013 19:06
> An: users@pdfbox.apache.org
> Betreff: Re: String replace failed
> 
> Hi Markus,
> 
> a little explanation what goes on here:
> 
> 1. You text strings are encoded as pdf hex strings 2. the PDF uses a encoding map
> 
> So in order to get to the string you need to look at the hex parts of the string and
look up the individual parts in the encoding map for the corresponding font which is used
for the text
> 
> Example from the first page of your PDF
> 
> /F409 35 Tf
> 1 0 0 -1 170.59300232 240.93499756 Tm [<00030004000500010006> -7 <000700080009000A000B>]
TJ
> 
> This means that the font used is F409. The first hex sequence is 0003. That corresponds
to character map <0003> <003c> which means that 0003 should be represented using
the unicode character 0003c which is the LESS-THAN SIGN (<)
> 
> ..
> 
> So in order to come up with a solution one would need to combine the code used e.g. for
ExtractText and combine that with the ReplaceString example.
> 
> Unfortunately as can be seen by the description above the ReplaceString example is overly
simplistic and only works in certain conditions.
> 
> 
> As the PDF you have is being produced using Apache fop couldn't you handle the replacement
in the pdf generation side? Would be much easier. 
> 
> BR
> Maruan Sahyoun
> 
> 
> Am 17.05.2013 um 13:39 schrieb Maruan Sahyoun <sahyoun@fileaffairs.de>:
> 
>> Hi Markus,
>> 
>> can't look at it atm. Will get back to it later today
>> 
>> BR
>> Maruan
>> 
>> 
>> Am 17.05.2013 um 13:02 schrieb <markus.sticker.epos@zf.com>:
>> 
>>> https://docs.google.com/file/d/0B9_jmweC39sxQTJycGNKdVVPWVk/edit?usp=
>>> sharing
>>> have a look at the log-File
>>> 
>>> -----Ursprüngliche Nachricht-----
>>> Von: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
>>> Gesendet: Freitag, 17. Mai 2013 12:58
>>> An: users@pdfbox.apache.org
>>> Betreff: Re: String replace failed
>>> 
>>> That's not easy :-)
>>> 
>>> You wrote " . parser returned a unreadable string . " which is the string you
are getting?
>>> 
>>> BR
>>> Maruan
>>> 
>>> 
>>> Am 17.05.2013 um 12:37 schrieb markus.sticker.epos@zf.com:
>>> 
>>>> My target is to replace ##VERSION## with "Release 9.8.3.4 (12th April 2013)"
>>>> It's on Page 3.
>>>> 
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
>>>> Gesendet: Freitag, 17. Mai 2013 12:33
>>>> An: users@pdfbox.apache.org
>>>> Betreff: Re: String replace failed
>>>> 
>>>> fine, I can extract the text. Could you describe what you are doing? E.g.
which text would you like to replace? Do you have a sample code snippet to verify? Do you
receive an error?
>>>> 
>>>> BR
>>>> Maruan Sahyoun
>>>> 
>>>> Am 17.05.2013 um 12:28 schrieb markus.sticker.epos@zf.com:
>>>> 
>>>>> OK.... here it is
>>>>> https://docs.google.com/file/d/0B9_jmweC39sxbmp2OXMtaXFTVG8/edit?us
>>>>> p=sharing
>>>>> 
>>>>> -----Ursprüngliche Nachricht-----
>>>>> Von: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
>>>>> Gesendet: Freitag, 17. Mai 2013 12:20
>>>>> An: users@pdfbox.apache.org
>>>>> Betreff: Re: String replace failed
>>>>> 
>>>>> Hi Markus,
>>>>> 
>>>>> No - the mailing list doesn't allow them. Could you upload the file somewhere
so we can download it?
>>>>> 
>>>>> BR
>>>>> Maruan Sahyoun
>>>>> 
>>>>> Am 17.05.2013 um 12:09 schrieb <markus.sticker.epos@zf.com>:
>>>>> 
>>>>>> Sorry, maybe our Mail-Gateway removes attachments
>>>>>> 
>>>>>> -----Ursprüngliche Nachricht-----
>>>>>> Von: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
>>>>>> Gesendet: Freitag, 17. Mai 2013 11:59
>>>>>> An: users@pdfbox.apache.org
>>>>>> Betreff: Re: String replace failed
>>>>>> 
>>>>>> Hi Markus,
>>>>>> 
>>>>>> could you be a little more specific? Maybe with a sample PDF and
some code? Replacing a string in a pdf can be much more complex than the ReplaceString example
suggests. 
>>>>>> 
>>>>>> BR
>>>>>> Maruan Sahyoun
>>>>>> 
>>>>>> Am 17.05.2013 um 11:41 schrieb <markus.sticker.epos@zf.com>:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I tried to use the String replace example, but the parser returned
a unreadable string.
>>>>>>> I use the java code like in the example.
>>>>>>> 
>>>>>>> Best regards
>>>>>>> 
>>>>>>> Markus
>>> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message