pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fernando Almeida <fernandoalmeida...@gmail.com>
Subject Font properties
Date Thu, 27 Dec 2012 20:55:54 GMT
Hi everyone.

I'm new to PDFBOX, but following some examples, I could handle to convert a
pdf to text.

So, the problem, it's that I want to extract some info, not all the text,
so I made a list of keywords and using matcher, I could find matching words.
But this is not enough, because I need the text that follows the keywords.

I'll give an example of the text:

*keyword1:* text I want to associate1 *keyword2:* text in the same line, I
want it too
*keyword3:* it could be one or more keywords in the same line as above

So, I'm not figuring out how to do it. The only option I'm thinking is to
use the fact that all keywords are in bold, and the associated value are
normal font.

Does PDFBOX can get the font properties? There is another way to do it??

Thanks in advance

Fernando Almeida

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message