pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Douglas <edoug...@blockhouse.com>
Subject Re: Reading text using TextPosition
Date Wed, 22 Apr 2015 17:03:37 GMT
What have you got so far?  Can you provide sample code to work with?

On Wed, Apr 22, 2015 at 12:02 PM, Hesham G. <heshamgneady@gmail.com> wrote:

> Frank ,
>
> I have handled TextPositions using X & Y coordinates as you have suggested
> to detect new lines. It works fine, but if a sentence is written on 2 lines
> I can't detect it. If you know a trick to detect that it will help a lot.
>
> Best regards ,
> Hesham
>
> ------------------------------------------------------------------------
>
> Hi Hesham,
>
> There is no newline character in a PDF. Only printable characters are
> saved, each with its X and Y coordinates.
> If you sort the TextPositions by Y and X, you can detect 'newlines' by
> finding an increase in Y and a decrease in X. However, this isn't
> foolproof, since things like subscripts and superscripts are out of order
> when sorted by Y. Where there are multiple columns, this won't work.
>
> Frank
>
>
> On Wed, Apr 22, 2015 at 7:33 AM, Hesham G. <heshamgneady@gmail.com> wrote:
>
>  Hello ,
>>
>> When reading PDF text using TextPosition, is there a way to know if the
>> current character is a new line character ?
>>
>> protected void processTextPosition( TextPosition text )  {
>>     System.out.println( text.getCharacter() );  // Prints space if this is
>> a new line character in the PDF file.
>> }
>>
>>
>> Best regards ,
>> Hesham
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message