pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <markus.sticker.e...@zf.com>
Subject AW: String replace failed
Date Fri, 07 Jun 2013 12:36:51 GMT
Hi,

sorry for the delay. (Vacation)
this week I thried to merge the stripper with the TextReplace, but I didn't get it managed.
Because the PDFTextStripper doesn't work in this way the StringReplace Sample do.
Maybe you could be so kind to give me an other hint.

Best regards

Markus



-----Ursprüngliche Nachricht-----
Von: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de] 
Gesendet: Freitag, 17. Mai 2013 19:06
An: users@pdfbox.apache.org
Betreff: Re: String replace failed

Hi Markus,

a little explanation what goes on here:

1. You text strings are encoded as pdf hex strings 2. the PDF uses a encoding map

So in order to get to the string you need to look at the hex parts of the string and look
up the individual parts in the encoding map for the corresponding font which is used for the
text

Example from the first page of your PDF

/F409 35 Tf
1 0 0 -1 170.59300232 240.93499756 Tm [<00030004000500010006> -7 <000700080009000A000B>]
TJ

This means that the font used is F409. The first hex sequence is 0003. That corresponds to
character map <0003> <003c> which means that 0003 should be represented using
the unicode character 0003c which is the LESS-THAN SIGN (<)

..

So in order to come up with a solution one would need to combine the code used e.g. for ExtractText
and combine that with the ReplaceString example.

Unfortunately as can be seen by the description above the ReplaceString example is overly
simplistic and only works in certain conditions.


As the PDF you have is being produced using Apache fop couldn't you handle the replacement
in the pdf generation side? Would be much easier. 

BR
Maruan Sahyoun


Am 17.05.2013 um 13:39 schrieb Maruan Sahyoun <sahyoun@fileaffairs.de>:

> Hi Markus,
> 
> can't look at it atm. Will get back to it later today
> 
> BR
> Maruan
> 
> 
> Am 17.05.2013 um 13:02 schrieb <markus.sticker.epos@zf.com>:
> 
>> https://docs.google.com/file/d/0B9_jmweC39sxQTJycGNKdVVPWVk/edit?usp=
>> sharing
>> have a look at the log-File
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
>> Gesendet: Freitag, 17. Mai 2013 12:58
>> An: users@pdfbox.apache.org
>> Betreff: Re: String replace failed
>> 
>> That's not easy :-)
>> 
>> You wrote " . parser returned a unreadable string . " which is the string you are
getting?
>> 
>> BR
>> Maruan
>> 
>> 
>> Am 17.05.2013 um 12:37 schrieb markus.sticker.epos@zf.com:
>> 
>>> My target is to replace ##VERSION## with "Release 9.8.3.4 (12th April 2013)"
>>> It's on Page 3.
>>> 
>>> -----Ursprüngliche Nachricht-----
>>> Von: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
>>> Gesendet: Freitag, 17. Mai 2013 12:33
>>> An: users@pdfbox.apache.org
>>> Betreff: Re: String replace failed
>>> 
>>> fine, I can extract the text. Could you describe what you are doing? E.g. which
text would you like to replace? Do you have a sample code snippet to verify? Do you receive
an error?
>>> 
>>> BR
>>> Maruan Sahyoun
>>> 
>>> Am 17.05.2013 um 12:28 schrieb markus.sticker.epos@zf.com:
>>> 
>>>> OK.... here it is
>>>> https://docs.google.com/file/d/0B9_jmweC39sxbmp2OXMtaXFTVG8/edit?us
>>>> p=sharing
>>>> 
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
>>>> Gesendet: Freitag, 17. Mai 2013 12:20
>>>> An: users@pdfbox.apache.org
>>>> Betreff: Re: String replace failed
>>>> 
>>>> Hi Markus,
>>>> 
>>>> No - the mailing list doesn't allow them. Could you upload the file somewhere
so we can download it?
>>>> 
>>>> BR
>>>> Maruan Sahyoun
>>>> 
>>>> Am 17.05.2013 um 12:09 schrieb <markus.sticker.epos@zf.com>:
>>>> 
>>>>> Sorry, maybe our Mail-Gateway removes attachments
>>>>> 
>>>>> -----Ursprüngliche Nachricht-----
>>>>> Von: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
>>>>> Gesendet: Freitag, 17. Mai 2013 11:59
>>>>> An: users@pdfbox.apache.org
>>>>> Betreff: Re: String replace failed
>>>>> 
>>>>> Hi Markus,
>>>>> 
>>>>> could you be a little more specific? Maybe with a sample PDF and some
code? Replacing a string in a pdf can be much more complex than the ReplaceString example
suggests. 
>>>>> 
>>>>> BR
>>>>> Maruan Sahyoun
>>>>> 
>>>>> Am 17.05.2013 um 11:41 schrieb <markus.sticker.epos@zf.com>:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I tried to use the String replace example, but the parser returned
a unreadable string.
>>>>>> I use the java code like in the example.
>>>>>> 
>>>>>> Best regards
>>>>>> 
>>>>>> Markus
>> 


Mime
View raw message