pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkuehler <andr...@lehmi.de>
Subject Re: Mismatch between XeLaTeX fontspec and Apache PDFBox
Date Mon, 29 Jan 2018 11:17:05 GMT
Am 29.01.2018 um 11:02 schrieb Flynn, Peter:
> 
> /bin/java -jar /usr/local/src/pdfbox-app-1.8.4.jar \
>                      ExtractText -html -force V$pubno-crop.pdf \
>                      V$pubno-crop.html

You are using an ancient version of PDFBox, please update to a more recent 
version like 1.8.13 or better to 2.0.8

Andreas

> --
> Peter Flynn | Academic & Collaborative Technologies | University College Cork IT
Services | ☎ +353 21 490 2609 | ✉ pflynn@ucc.ie<mailto:pflynn@ucc.ie> | 🌍 www.ucc.ie<http://www.ucc.ie>
> 
> 
> 
> On 2018-01-28 12:30:58+00:00 Tilman Hausherr wrote:
> 
> Hi,
> I can only answer about PDFBox... no PDF has anything bold. Both have
> something italic.
> 
> Yes, sorry about that. I picked an example that only has italic.
> 
> The PDF without fontspec doesn't have the "é".
> 
> Correct. But it does convert with PDFBox and identifies the italics.
> 
> The PDF with fontspec can be converted to HTML with "ExtractText -html"
> 
> I convert with the command
> 
> /bin/java -jar /usr/local/src/pdfbox-app-1.8.4.jar ExtractText -html -force filename.pdf
filename.html
> 
> and the results were as given in thre .zip file: no italics. What version are you using?
> 
> P
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message