pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkuehler <andr...@lehmi.de>
Subject Re: Please Help.. (extracting bold words from pdf)
Date Sun, 13 Jan 2013 20:26:43 GMT

Am 02.01.2013 17:51, schrieb Veli Hasanli:
> Hi.
> I can not use external fonts file for extracting bold words from pdf file..
> There is no any problem in extracting bold words from any pdf file that is
> written in standart font type.
> Let me explain my  question on an example:
> (All files are attached to this email except one pdf  file.. It is more than 5
> mb.. and I could not send it via email.. Please download it from:
> http://www.share.az/ldimjsxxc5dh/Azerbaycan_dilinin_izahli_lugeti___II_hisse__.pdf.html)
> 1- I can extract the bold words from deneme.pdf by using my codes in
> PdftenSozCikarma.java
> 2- But I can not do the same thing for myPdfDocument.pdf
> myPdfDocument.pdf i s wrintten in the Az_Times_Lat font. (included in
> AzFonts.rar).. You can easily check up by copying some words to MS-Word and
> choose Az_Times_Lat. (Of course at the first step you must add that fot to the
> FONTS Folder in Control Panel)
There are different issues:

1) you can't use the font name to determine if it is a bold font
2) you can't simply exchange a font by loading another one
3) Do the "adobe-test" if you can't extract the text of a specific pdf using 
PDFBox. Open the document using acrobat reader and try to copy&paste the
text to an text editor. If it doesn't work it won't work using any other pdf tool.

> PLease help me..
> Thanks a lot.
> Best regards

Andreas Lehmkühler

View raw message