pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: PDFBox and Pattern
Date Mon, 09 Jan 2017 16:42:36 GMT
Am 09.01.2017 um 13:23 schrieb criekenb@web.de:
>   
>
> Hi,
> thanks for reply, but I only have the string the pattern matches.
> In PDF there is a table like this:
> 10110  Paperbox  3,49
> 30220N  Scissors    7,99

It seems that what you want to tell is that text extraction doesn't get 
the whole line.

Did you try the sort option?

re attachments: upload your file to a sharehoster

Tilman

>   
>
> My pattern only matches first column. To get description and price, I need whole line.
> So does PDFBox knows a command to get the actual line completly?
>   
> thx
>
>
>
> Gesendet: Montag, 09. Januar 2017 um 07:25 Uhr
> Von: "Tilman Hausherr" <THausherr@t-online.de>
> An: users@pdfbox.apache.org
> Betreff: Re: PDFBox and Pattern
> Am 09.01.2017 um 01:12 schrieb criekenb@web.de:
>> Dear all,
>>
>> I have a pdf with tables (and some other stuff).
>>
>> I used
>> Pattern p = Pattern.compile("([0-9]{5})[A-Z]?");
>> to get the lines with 5 digits an optional a char.
>>
>> How to read the whole line if a match was found?
> You already have the line if you checked whether it matches.
>
>> Otherwise: Is there a good alternative to read in the data as table directly?
> Try tabula
>
>> Other topic: Is it not possible to send attachments here?
> No
>
> Tilman
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>   
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message