pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: How to retrieve rectangle bounds for Chart element in the PDF document
Date Thu, 23 Nov 2017 16:53:10 GMT
There is no out-of-the-box solution for this (and the other posting). 
PDF is not a format that has a <TABLE>...</TABLE>  or <CHART>...</CHART>

syntax. PDF is just graphics. You can get the lines / shapes with this:
https://stackoverflow.com/questions/38931422/pdfbox-2-0-2-calling-of-pagedrawer-processpage-method-caught-exceptions
However you'll still have to do something to find out where your table / 
chart is.

To get some understanding on how tricky this is, open your file with 
PDFDebugger and look at the "contents" part. The operators you see are 
explained in the PDF 32000 specification ( 
https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf 
), in the segment "operator summary". (start with operators m, l, c, f 
and s).

Your shape object is this:

   0.357 0.608 0.835 rg
   125.06 715.44 m
   125.06 717.96 127.1 720 129.61 720 c
   204 720 l
   206.51 720 208.56 717.96 208.56 715.44 c
   208.56 697.21 l
   208.56 694.69 206.51 692.65 204 692.65 c
   129.61 692.65 l
   127.1 692.65 125.06 694.69 125.06 697.21 c
   h
   f*
   1 w
   0.255 0.443 0.612 RG
   125.06 715.44 m
   125.06 717.96 127.1 720 129.61 720 c
   204 720 l
   206.51 720 208.56 717.96 208.56 715.44 c
   208.56 697.21 l
   208.56 694.69 206.51 692.65 204 692.65 c
   129.61 692.65 l
   127.1 692.65 125.06 694.69 125.06 697.21 c
   h
   S

The chart in the other file is more difficult to find, I didn't even try.

Tilman

Am 23.11.2017 um 05:00 schrieb S S Satyanarayana Damarla:
> Looks like PDF document attachment didn't get through.
>
>   
>
> I have uploaded the PDF document at the following location:
>
> https://drive.google.com/file/d/1uYoQweCVbO4cNQiMnJuVjM1WZu7Cr7Ae/view
>
>   
>
> Please look into above link for accessing the PDF document that contains this Chart.
>
>   
>
> Appreciate any help on this.
>
>   
>
> Thanks,
>
> -Satya
>
>   
>
>   
>
> On 2017-11-22 18:06, S S Satyanarayana Damarla <HYPERLINK "mailto:s...@oracle.com"s...@oracle.com>
wrote:
>
>> Hi,
>> I have attached a PDF document which contains a Chart.
>> For our project, we need the ability to retrieve rectangle bounds for the Chart present
in the attached PDF document. This chart is not recognized as image object (PDImageXObject).
Looks like it is embedded in the content stream.
>> Appreciate if you can help me with a sample code in retrieving rectangle bounds for
the chart present in the attached PDF document.
>> Thanks
>> -Satya



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message