pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gregor Kovač <kov...@gmail.com>
Subject Re: Spacing between lines not retained
Date Fri, 29 Jul 2016 11:21:17 GMT
Hi!

Yes, if it works like that, it seems like a bug to me too. But for that one
of the developers would have to look at it.

Best regards,
    Kovi

2016-07-29 13:19 GMT+02:00 Shyam Sundar <sw.craftsman@gmail.com>:

> Thanks Kovi for quick response.
>
> Well why does it fail only for a particular file, a replica of same file
> generated using another pdf library works perfectly fine with
> PDFTextStripper ... isn't it strange and look like a bug ?
>
> I hope you checked shared Sample.zip, it has both working & non-working
> files.
>
> Regards.
>
> On Fri, Jul 29, 2016 at 4:30 PM, Gregor Kovač <kovica@gmail.com> wrote:
>
> > Hi!
> >
> > API docs for PDFTextStripper (
> >
> >
> http://pdfbox.apache.org/docs/2.0.2/javadocs/org/apache/pdfbox/text/PDFTextStripper.html
> > )
> > states that "This class will take a pdf document and strip out all of the
> > text and ignore the formatting and such". Please note that you can
> > call setAddMoreFormatting (
> >
> >
> http://pdfbox.apache.org/docs/2.0.2/javadocs/org/apache/pdfbox/text/PDFTextStripper.html#setAddMoreFormatting(boolean)
> > )
> > with true and it will add a bit more formatting, but in my experience
> this
> > does not compare to using "pdftotext -layout" from Xpdf project.
> pdftotext
> > does a much better job preserving layout.
> >
> > Best regards,
> >     Kovi
> >
> > 2016-07-29 12:44 GMT+02:00 Shyam Sundar <sw.craftsman@gmail.com>:
> >
> > > Hi,
> > >
> > > While converting a particular pdf to txt, spacing between lines and
> > > paragraphs is not retained, output is just a flat text.
> > >
> > > Sample file : ftp://PfXxyEhxh:h7hHhpOh7O@ftp.emc.com/Sample.zip
> > >
> > > Looks like a file specific issue. Can you pls check ?
> > >
> > > Thanks.
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> > > For additional commands, e-mail: users-help@pdfbox.apache.org
> > >
> >
> >
> >
> > --
> > -~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~
> > |  In A World Without Fences Who Needs Gates?  |
> > |              Experience Linux.               |
> > -~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~
> >
>



-- 
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~
|  In A World Without Fences Who Needs Gates?  |
|              Experience Linux.               |
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message