So replacing text in any PDF is going to be very complicated, if I want
to keep the formatting the same.
I put some print statements in ReplaceString.java. I never see the text
(even in pieces) for which I am looking!
However, PdfTextStripper does produce it. This seems bizarre!
This document has eight PDF streams in it. Could this be the reason?
Thanks, Alan
Subject:
<http://markmail.org/message/ygcdrv2o4zg5iqq2> Re: ReplaceString example
<http://markmail.org/message/ygcdrv2o4zg5iqq2>
http://pdfbox.markmail.org/images/permalink.gif
From:
Andreas Lehmkuehler (andr...@lehmi.de)
Date:
Dec 30, 2010 2:18:44 am
List:
org.apache.pdfbox.users
Hi,
Am 29.12.2010 03:33, schrieb Alan Thomas:
I used the ReplaceString example that comes with PDFBox on a PDF file I
have. However, it does not find the text I want to replace.
In looking at the code and putting in some debugging statements, I found
out that the code was looking for a "PDFOperator" operation
Correct.
(from the getOperation() method) of "Tj" and "TJ". However, my PDF file has
neither.
Question: Where can I find the list of all the operators that
display
strings in a PDF file? (Or is there an easier way to search and replace
strings?)
Textcontent may be defined in different ways within pdfs. In most cases text
will be splitted into several chunks. They often consist of one or more
characters, but not necessarily whole words or lines of text. Consequently
one
has to combine all these text chunks to identify the given text. The
PDFTextStripper class [1] works like that.
Have a look at the PDF reference at [2] section 9.3 "Text State Parameters
and
Operators" for further information.
BR
Andreas Lehmkühler
[1]
http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/java/org/apache
/pdfbox/util/PDFTextStripper.java
[2] http://www.adobe.com/devnet/pdf/pdf_reference.html
|