pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Thomas" <jalantho...@verizon.net>
Subject Re: ReplaceString example
Date Tue, 11 Jan 2011 00:18:28 GMT
   So replacing text in any PDF is going to be very complicated, if I want

to keep the formatting the same.

 

    I put some print statements in ReplaceString.java.  I never see the text

(even in pieces) for which I am looking!

 

However, PdfTextStripper does produce it.  This seems bizarre!  

 

    This document has eight PDF streams in it.  Could this be the reason?  

 

                                                    Thanks, Alan 

 

Subject:

 

<http://markmail.org/message/ygcdrv2o4zg5iqq2> Re: ReplaceString example

 

<http://markmail.org/message/ygcdrv2o4zg5iqq2>

http://pdfbox.markmail.org/images/permalink.gif

From:

 

Andreas Lehmkuehler (andr...@lehmi.de)

 

Date:

 

Dec 30, 2010 2:18:44 am

 

List:

 

org.apache.pdfbox.users

 

Hi,

 

Am 29.12.2010 03:33, schrieb Alan Thomas:

 

     I used the ReplaceString example that comes with PDFBox on a PDF file I

 

have.  However, it does not find the text I want to replace.

 

    In looking at the code and putting in some debugging statements, I found

 

out that the code was looking for a "PDFOperator" operation

 

Correct.

 

(from the getOperation() method) of "Tj" and "TJ".  However, my PDF file has

 

neither.

 

      Question:   Where can I find the list of all the operators that

display

 

strings in a PDF file?  (Or is there an easier way to search and replace

 

strings?)

 

Textcontent may be defined in different ways within pdfs. In most cases text

 

will be splitted into several chunks. They often consist of one or more 

 

characters, but not necessarily whole words or lines of text. Consequently

one 

 

has to combine all these text chunks to identify the given text. The 

 

PDFTextStripper class [1] works like that.

 

Have a look at the PDF reference at [2] section 9.3 "Text State Parameters

and 

 

Operators" for further information.

 

BR

 

Andreas Lehmkühler

 

[1] 

 

http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/java/org/apache

/pdfbox/util/PDFTextStripper.java

 

[2] http://www.adobe.com/devnet/pdf/pdf_reference.html

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message