pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Couldn't be retrieve some of character's locations.
Date Tue, 22 Aug 2017 15:57:40 GMT
Hi,

Sorry about that.

What PDFBox version are you using? The current one is 2.0.7. The generic 
example is PrintTextLocations.java, and DrawPrintTextLocations.java is 
the same visually (see output: http://imgur.com/a/1awtu )

Which characters were you not able to retrieve the location? Please 
describe where it is, e.g. "top left", whatever, or please explain what 
you were expecting and missed.

Tilman

Am 22.08.2017 um 17:44 schrieb 二川村田:
> Hello
>
> I tried to get texts from below pdf.
>
> http://jpdb.nihs.go.jp/jp17e/000217651.pdf
>
> On first page, there were some characters that I could retrieve locations,
> but there were also characters that I couldn't.
>
> What is reason of this problem?
>
>
> ========================
> my source to retrieve character's locations
> ========================
>
> =====================
> //class extends PDFTextStripper
> class PDFTextCordinateStripper extends PDFTextStripper {
>
> public List<TextPosition> list_text = new ArrayList<TextPosition>();
>
> public PDFTextCordinateStripper() throws IOException {
> super();
> }
>
> protected void processTextPosition(TextPosition text) {
> super.processTextPosition(text);
> list_text.add(text);
> }
>
> }
>
>
> =====================
> // main(omited)
> PDFTextCordinateStripper stripper = new PDFTextCordinateStripper();
>
> int len_page = doc.getNumberOfPages();
> for (int ind = 1; ind <= len_page; ind++) {
>
> PDPage pg = doc.getPage(ind - 1);
>
> String str_page_num = "PageNum: " + ind;
>
> String str_page_size =
> "Width: " + pg_w
> + "\tHeight: " + pg_h;
>
> System.out.println(str_page_num + "\t" + str_page_size);
>
> stripper.list_text.clear();
> stripper.setStartPage(ind);
> stripper.setEndPage(ind);
> stripper.getText(doc);
>
> String p_text = stripper.getText(doc);
>
> Iterator<String> it_str = Arrays.asList(p_text.split("")).iterator();
> int ind_tp = 0;
> List<TextPosition> list_tp = stripper.list_text;
> int len_list_tp = list_tp.size();
> while (it_str.hasNext()) {
>      String ch = it_str.next();
>      String str_rec = "Text: " + ch;
>
>      if (ind_tp < len_list_tp) {
>          TextPosition tp = list_tp.get(ind_tp);
>          if (ch.equals(tp.toString())){
>              str_rec += "\tx: " + tp.getX()
>                      + "\ty: " + tp.getY()
>                      + "\tw: " + tp.getWidth()
>                      + "\th: " + tp.getHeight()
>                      + "\tfont_size: " + tp.getFontSizeInPt();
>              ind_tp++;
>          }
>      }
>
>      System.out.println(str_rec);
> }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message