pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 二川村田 <sutenik...@gmail.com>
Subject Re: space between words
Date Sun, 04 Jun 2017 14:45:32 GMT
Thank you for your reply, Mr. Hausherr.

I send my codes.

It looks similar to the codes you sent.

I want to use Java program, not commandline application.

I use the library pdfbox-2.0.6.jar

=====================
//class extends PDFTextStripper
class PDFTextCordinateStripper extends PDFTextStripper {

public List<TextPosition> list_text = new ArrayList<TextPosition>();

public PDFTextCordinateStripper() throws IOException {
super();
}

protected void processTextPosition(TextPosition text) {
super.processTextPosition(text);
list_text.add(text);
}

}


=====================
// main(omited)
PDFTextCordinateStripper stripper = new PDFTextCordinateStripper();

int len_page = doc.getNumberOfPages();
for (int ind = 1; ind <= len_page; ind++) {

PDPage pg = doc.getPage(ind - 1);

String str_page_num = "PageNum: " + ind;

String str_page_size =
"Width: " + pg_w
+ "\tHeight: " + pg_h;

System.out.println(str_page_num + "\t" + str_page_size);

stripper.list_text.clear();
stripper.setStartPage(ind);
stripper.setEndPage(ind);
stripper.getText(doc);

Iterator<TextPosition> it_text = stripper.list_text.iterator();
while (it_text.hasNext()) {
TextPosition rec = it_text.next();
String str_rec
= "Text: " + rec.toString()
+ "\tx: " + rec.getX()
+ "\ty: " + rec.getY()
+ "\tw: " + rec.getWidth()
+ "\th: " + rec.getHeight()
+ "\tfont_size: " + rec.getFontSizeInPt();
System.out.println(str_rec);
}
}

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message