pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: space between words
Date Sun, 04 Jun 2017 06:49:40 GMT
Am 04.06.2017 um 07:17 schrieb Tilman Hausherr:
> Here's what I got with 2.0.6 using the ExtractText command line 
> application:
>
> JP XVII
> THE JAPANESE PHARMACOPOEIA
> SEVENTEENTH EDITION
> Official from April 1, 2016
> English Version
> THE MINISTRY OF HEALTH, LABOUR AND WELFARE
> Notice: This English Version of the Japanese Pharmacopoeia is published
> for the convenience of users unfamiliar with the Japanese language. When
> and if any discrepancy arises between the Japanese original and its 
> English
> translation, the former is authentic. 



I forgot to mention, if you want text like this, use PDFTextStripper.

PDFTextStripper stripper = new PDFTextStripper();
stripper.setStartPage(1); // 1 based
stripper.setEndPage(3);
String text = stripper.getText(doc);


This will extract pages 1 - 3.

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message