pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Identify not visible characters - Overlapped characters
Date Wed, 28 Dec 2016 20:49:57 GMT
Am 28.12.2016 um 21:32 schrieb Manuel Aristarán:
>> On Dec 28, 2016, at 8:18 AM, Tilman Hausherr <THausherr@t-online.de> wrote:
>>
>> […]
>> Try also https://github.com/tabulapdf/ <https://github.com/tabulapdf/> , I
wonder how they handle this problem.
> Hi, main author of Tabula here.
>
> We've come across that case many times. Some spreadsheet->PDF generators clip a cell's
content to the extent of its container. We handle it by simply detecting whether a character
is inside the current clipping path [1].
>
> Cheers,
>
> [1] https://github.com/tabulapdf/tabula-java/blob/master/src/main/java/technology/tabula/ObjectExtractor.java#L342

Ah, you're extending PageDrawer. That of course gives you the clipping 
path on a silver plate :-)

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message