pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilad Denneboom <gilad.denneb...@gmail.com>
Subject Re: Add highlight/annotation to known string of text within a PDF
Date Tue, 02 Sep 2014 21:29:19 GMT
Yes, that's pretty much how you can do it, and yes, it's very tricky to
implement.
I have in fact written the code that does something like that and I use it
in many of my applications.

Acrobat and Preview probably do something similar, yes.


On Tue, Sep 2, 2014 at 11:11 PM, Joël Kuiper <me@joelkuiper.eu> wrote:

> Hey Maruan,
>
> Thought that would be easier … but unless there’s a way I’m overlooking
> it’s actually really tricky.
> I guess it would mean lifting the code from the PDFTextStripper that does
> the extraction, and instead of returning just the string … also return the
> a mapping to the TextPosition’s.
> Then somehow figure out from the TextPosition’s the bounding boxes of the
> text … then write those as annotations separately, I guess.
>
> It all seems rather complicated … is this the route Acrobat and
> Preview.app etc take to make the highlighting work?
>
> Joël
>
>
> > On 02 Sep 2014, at 19:58, Maruan Sahyoun <sahyoun@fileaffairs.de> wrote:
> >
> > Hi Joël,
> >
> > do you already have the text positions on the page?
> >
> > Maruan Sahyoun
> >
> >> Am 02.09.2014 um 19:52 schrieb "Joël Kuiper" <joel@joelkuiper.eu>:
> >>
> >> Well they're uploaded. Basically a user uploads a PDF, the system runs
> some prediction / pattern matching on the text  and the user receives the
> PDF with the predicted parts highlighted.
> >>
> >>
> >> I'm just a bit confused on how to (properly) do the last part.
> >> —
> >> https://joelkuiper.eu
> >>
> >>> On Tue, Sep 2, 2014 at 7:30 PM, Jan Tosovsky <j.tosovsky@email.cz>
> wrote:
> >>>
> >>>> On 2014-09-02 Joël Kuiper wrote:
> >>>>
> >>>> The problem is that I have a PDF for which I want to highlight a known
> >>>> string with a color.
> >>> From what the PDF is produced? It is always better to do this kind of
> job in the source document.
> >>> Jan
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message