pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joël Kuiper <j...@joelkuiper.eu>
Subject Re: Add highlight/annotation to known string of text within a PDF
Date Tue, 09 Sep 2014 22:56:34 GMT
So I figured it out. Those were not a pleasant 6 hours ;-) 

I’ve subclassed the PDFTextStripper to build a cache (called textCache) that maintains (per
page) a mapping between the characters and the TextPositions, instead of just returning the
final string. 
Using a regular expression you can then find the TextPositions in the cache that match the
pattern. 
From that list of TextPositions the bounding boxes can then be calculated which can be put
in as PDAnnotationTextMarkup's. 

The code is not pretty (haven’t done Java in a while and it was a rush job) but it may provide
a nice starting point for more serious stuff! 

https://gist.github.com/joelkuiper/9eb52555e02edb653dcf

Hopefully this is useful to someone else as well! 

Joël
Mime
View raw message