pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre Dubillot <alexcou...@gmail.com>
Subject Re: Extraction problems with PDFTextStripperByArea
Date Fri, 24 Jul 2015 08:38:42 GMT
Hi,
I've got greats news. That was exactly what you were telling. The
LowerLeftY was not set at 0, so, the mediabox was broken..
I spent around 1 week to solve this problem, your help is really
appreciated !

Best regards,
Pierre

2015-07-23 21:38 GMT+02:00 Tilman Hausherr <THausherr@t-online.de>:

> I ran the ExtractText command utility. In the original PDF, CAVANNA
> appears once on each day, so 7 times at all. In the "new" file, when
> extracting all, it appears 49 times.
>
> This suggests that the text extraction logic doesn't bother about the
> cropbox / mediabox / whatever. Hard to tell whether this is OK or not.
>
> It would be nice if you could upload the extract code.
>
> Can you try to change your extract code so that it uses the changing "y"
> value (probably getLowerLeftY() ) of the media box (PDPage.getMediaBox())
> in each page?
>
>
> Tilman
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message