pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tapani Vaulasto <tapani.vaula...@gmail.com>
Subject Re: Text Extract keeping format and layout intact as pdf
Date Mon, 28 Oct 2013 10:44:00 GMT
u can try AsXML.java or PDFTextStream if they r for u.i have used them to
get tables from pdf.
AsXML.java gives a lot of lines.
PDFTextStream has been done from pdfbox.


2013/10/28 Abhishek Pawar <Abhishek.Pawar@lntinfotech.com>

> Team,
>             How can I extract text from a pdf file / pdf page with its
> formatting and layout ?
> This is important because if a pdf contains data in a tabular format ,
>  after text extraction using pdfbox the result becomes very messy.
>
> Please Help !
>
> Regards,
> Abhishek Pawar.
> L&T Infotech
>
> ________________________________
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and using or disseminating the information, and must notify the
> sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"
>



-- 
Ystävällisin terveisin
Tapani Vaulasto
p. +35845 6791830

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message