pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: fake lines
Date Tue, 17 Jan 2017 17:47:35 GMT
Am 17.01.2017 um 10:19 schrieb Cem Dayanik (Ibtech-Software Infrastructure):
> Original file(one page - removed texts):
>   4AOnePageWithoutText.pdf
>
> Lines.pdf is generated with parsed line infos.
>
> https://we.tl/M0M5eM4weR
>
> This is the set of lines I am trying to get rid of
>
> VERTICALLINE:Point2D.Double[142.0, 116.0] Point2D.Double[142.0, 276.0]


found with PDFDebugger:


[ ] 0 d
2 J
1 1 1 rg <=== white
1 w
2 J
0.2 0.2 0.2 RG  <= grey
[ ] 0 d
142 116 m
222 116 l
222 116 222 116 222 116 c
222 136 l
222 136 222 136 222 136 c
142 136 l
142 136 142 136 142 136 c
142 116 l
142 116 142 116 142 116 c
f
142 116 m
222 116 l
222 116 222 116 222 116 c
222 136 l
222 136 222 136 222 136 c
142 136 l
142 136 142 136 142 136 c
142 116 l
142 116 142 116 142 116 c
S

This is a mix or lines and bezier curves. But the bezier curves do 
nothing, source, control point and target are identical. The fill ("f") 
is done in white, the strokes are done in grey.

Sadly I haven't understood your question. I hope this helps a bit.

Tilman



>
>
>
>
>
> -----Original Message-----
> From: Cem Dayanik (Ibtech-Software Infrastructure)
> Sent: Tuesday, January 17, 2017 11:06 AM
> To: users@pdfbox.apache.org
> Subject: RE: fake lines
>
> Stroking color is same for every line.
> Graphics.color/bg color comparison didnt work.
>
> I tried something like discardLines4Rectangele(p0, p1, p2, p3) at the start of appendrectangle.
> Maybe some rectangle overrides it, but no luck.
>
> I am not using linepath directly, this is what it looks like.
>
>
>          @Override
>          public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3)
{
>                  //discardLines4Rectangele(p0, p1, p2, p3);
>                  this.lineList.add(new LineInfo(new Point2D.Double(p0.getX(), p0.getY()),new
Point2D.Double(p1.getX(), p1.getY()), this.getGraphicsState().getLineWidth()));
>                  this.lineList.add(new LineInfo(new Point2D.Double(p1.getX(), p1.getY()),new
Point2D.Double(p2.getX(), p2.getY()), this.getGraphicsState().getLineWidth()));
>                  this.lineList.add(new LineInfo(new Point2D.Double(p2.getX(), p2.getY()),new
Point2D.Double(p3.getX(), p3.getY()), this.getGraphicsState().getLineWidth()));
>                  this.lineList.add(new LineInfo(new Point2D.Double(p3.getX(), p3.getY()),new
Point2D.Double(p0.getX(), p0.getY()), this.getGraphicsState().getLineWidth()));
>                  super.appendRectangle(p0, p1, p2, p3);
>          }
>
>
>          @Override
>          public void lineTo(float x, float y) {
>                  Point2D currentPoint = this.getCurrentPoint();
>                  //Graphics2D graphics = this.getGraphics();
>                  //PDGraphicsState state = this.getGraphicsState();
>                  this.lineList.add(new LineInfo(new Point2D.Double(currentPoint.getX(),
currentPoint.getY()), new Point2D.Double((double)x, (double)y), this.getGraphicsState().getLineWidth()));
>                  super.lineTo(x, y);
>          }
>
>
>
> Any other idea?
> Is it possible to debug where those lines are "ignored" while rendering?
> They are buffered right? The actual rendering is happening all of this stuff done?
>
> Not: cant find any working upload server atm (annoying company policy)
>
>
>
>
> -----Original Message-----
> From: Tilman Hausherr [mailto:THausherr@t-online.de]
> Sent: Tuesday, January 17, 2017 10:03 AM
> To: users@pdfbox.apache.org
> Subject: Re: fake lines
>
> Attachments usually don't go through, you'd have to upload them somewhere.
>
> In PageDrawer.java, the lines are in "linePath". For stroke it's the stroking color.
>
> Tilman
>
> Am 17.01.2017 um 07:45 schrieb Cem Dayanik (Ibtech-Software Infrastructure):
>> Hello everyone,
>>
>> I need to extract table data from pdf.
>>
>> I know there are different approaches for that, but the table has
>> “gridlines”, so i needed an exact solution.
>>
>> My problem is that, when I parse the pdf with a page drawer, there are
>> some lines that actually not “seen in pdf”.
>>
>> I need to discard them but I couldnt find how to.
>>
>> Obviously there is a hidden information in “graphics/graphicsstate”.
>>
>> (not width, background/foreground color)
>>
>> Please see attachments for clarification.
>>
>> Any help would be appreciated.
>>
>> Thanks.
>>
>> These are not “raw” lines, these are “combined” line info. Bold one I
>> need to get rid of (actually a set of lines, not single)
>>
>> HORIZONTALLINE:Point2D.Double[31.0, 276.0] Point2D.Double[565.0,
>> 276.0]
>>
>> HORIZONTALLINE:Point2D.Double[31.0, 311.0] Point2D.Double[565.0,
>> 311.0]
>>
>> HORIZONTALLINE:Point2D.Double[31.0, 256.0] Point2D.Double[565.0,
>> 256.0]
>>
>> HORIZONTALLINE:Point2D.Double[31.0, 236.0] Point2D.Double[565.0,
>> 236.0]
>>
>> HORIZONTALLINE:Point2D.Double[31.0, 216.0] Point2D.Double[565.0,
>> 216.0]
>>
>> HORIZONTALLINE:Point2D.Double[31.0, 196.0] Point2D.Double[565.0,
>> 196.0]
>>
>> HORIZONTALLINE:Point2D.Double[31.0, 176.0] Point2D.Double[565.0,
>> 176.0]
>>
>> HORIZONTALLINE:Point2D.Double[31.0, 156.0] Point2D.Double[565.0,
>> 156.0]
>>
>> HORIZONTALLINE:Point2D.Double[31.0, 136.0] Point2D.Double[565.0,
>> 136.0]
>>
>> HORIZONTALLINE:Point2D.Double[31.0, 116.0] Point2D.Double[565.0,
>> 116.0]
>>
>> HORIZONTALLINE:Point2D.Double[31.0, 108.5] Point2D.Double[564.0,
>> 108.5]
>>
>> VERTICALLINE:Point2D.Double[565.0, 116.0] Point2D.Double[565.0, 311.0]
>>
>> VERTICALLINE:Point2D.Double[31.0, 116.0] Point2D.Double[31.0, 311.0]
>>
>> VERTICALLINE:Point2D.Double[51.0, 116.0] Point2D.Double[51.0, 311.0]
>>
>> VERTICALLINE:Point2D.Double[95.0, 116.0] Point2D.Double[95.0, 311.0]
>>
>> VERTICALLINE:Point2D.Double[222.0, 116.0] Point2D.Double[222.0, 311.0]
>>
>> *VERTICALLINE:Point2D.Double[142.0, 116.0] Point2D.Double[142.0,
>> 276.0]***
>>
>> VERTICALLINE:Point2D.Double[247.0, 116.0] Point2D.Double[247.0, 311.0]
>>
>> VERTICALLINE:Point2D.Double[287.0, 116.0] Point2D.Double[287.0, 311.0]
>>
>> VERTICALLINE:Point2D.Double[310.0, 116.0] Point2D.Double[310.0, 311.0]
>>
>> VERTICALLINE:Point2D.Double[339.0, 116.0] Point2D.Double[339.0, 311.0]
>>
>> VERTICALLINE:Point2D.Double[369.0, 116.0] Point2D.Double[369.0, 311.0]
>>
>> VERTICALLINE:Point2D.Double[402.0, 116.0] Point2D.Double[402.0, 311.0]
>>
>> VERTICALLINE:Point2D.Double[452.0, 116.0] Point2D.Double[452.0, 311.0]
>>
>> VERTICALLINE:Point2D.Double[432.0, 116.0] Point2D.Double[432.0, 311.0]
>>
>> VERTICALLINE:Point2D.Double[507.0, 116.0] Point2D.Double[507.0, 311.0]
>>
>> VERTICALLINE:Point2D.Double[537.0, 116.0] Point2D.Double[537.0, 311.0]
>>
>> VERTICALLINE:Point2D.Double[147.0, 116.0] Point2D.Double[147.0, 311.0]
>>
>>
>>
>>
>>
>> Bu e-posta'nın içerdiği bilgiler (ekleri dahil olmak üzere) gizlidir.
>> Onayımız olmaksızın üçüncü kişilere açiklanamaz. Bu mesajın
>> gönderilmek istendiği kişi değilseniz, lütfen mesajı sisteminizden
>> derhal siliniz. IBTech A.Ş. bu mesajın içerdiği bilgilerin doğruluğu
>> veya eksiksiz olduğu konusunda bir garanti vermemektedir. Bu nedenle
>> bilgilerin ne şekilde olursa olsun içeriğinden, iletilmesinden,
>> alınmasından, saklanmasından sorumlu değildir. Bu mesajın içeriği
>> yazarına ait olup, IBTech A.Ş.'nin görüşlerini içermeyebilir.
>>
>> The information contained in this e-mail (including any attachments)is
>> confidential. It must not be disclosed to any person without our
>> authority. If you are not the intended recipient, please delete it
>> from your system immediately. IBTech A.S. makes no warranty as to the
>> accuracy or completeness of any information contained in this message
>> and hereby excludes any liability of any kind for the information
>> contained therein or for the information transmission, reception,
>> storage or use of such in any way whatsoever. Any opinions expressed
>> in this message are those of the author and may not necessarily
>> reflect the opinions of IBTech A.S.
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
>
> Bu e-posta'nin içerdigi bilgiler (ekleri dahil olmak
> üzere) gizlidir. Onayimiz olmaksizin üçüncü kisilere açiklanamaz. Bu mesajin
> gönderilmek istendigi kisi degilseniz, lütfen mesaji sisteminizden derhal
> siliniz. IBTech A.S. bu mesajin içerdigi bilgilerin dogrulugu veya eksiksiz
> oldugu konusunda bir garanti vermemektedir. Bu nedenle bilgilerin ne sekilde
> olursa olsun içeriginden, iletilmesinden, alinmasindan, saklanmasindan sorumlu
> degildir. Bu mesajin içerigi yazarina ait olup, IBTech A.S.'nin görüslerini
> içermeyebilir.
>
> The information contained in this e-mail (including any
> attachments)is confidential. It must not be disclosed to any person without our
> authority. If you are not the intended recipient, please delete it from your
> system immediately. IBTech A.S. makes no warranty as to the accuracy or
> completeness of any information contained in this message and hereby excludes
> any liability of any kind for the information contained therein or for the
> information transmission, reception, storage or use of such in any way
> whatsoever. Any opinions expressed in this message are those of the author and
> may not necessarily reflect the opinions of IBTech
> A.S.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message