pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkuehler <andr...@lehmi.de>
Subject Re: An empty column in a table
Date Fri, 27 Sep 2013 06:02:02 GMT
Hi,

Am 26.09.2013 13:36, schrieb Tapani Vaulasto:
> Hi,
> I use PDFBox 1.8.2 and this code to convert a PDF to txt-file:
>
> PDDocument pd = PDDocument.load(input);
> PDFTextStripper stripper = new PDFTextStripper();
> BufferedWriter wr = new BufferedWriter(new OutputStreamWriter(new
>   FileOutputStream(output)));
> stripper.writeText(pd, wr);
>
> A PDF documes has tables.
> Problem is that sometimes a table has one or more empty columns on a line.
> Like here:
> http://www.tulli.fi/fi/yksityisille/autoverotus/taulukot/autot/au/1308.pdf
>
> On the page 2(44)  some ALFA ROMEOs has an empty column.
>
> Question: How to get all columns marked on a line for BufferedWriter?
Sorry, this can't be done with PDFBox. You have to analyze the text on your own.

> Regards
> Tapani Vaulasto

BR
Andreas Lehmkühler


Mime
View raw message