pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject AW: Keeping data format intact after converting pdf file into txt
Date Tue, 13 Nov 2018 11:14:41 GMT
Hi,

please try the sort option.

However it will never be perfect. For tables use a tool like tabula-java.

Tilman



------------------------------------------------------------------------
Gesendet mit der Telekom Mail App
<https://kommunikationsdienste.t-online.de/redirects/email_app_android_sendmail_footer>



--- Original-Nachricht ---
Von: Vinayaka Dalwai
Betreff: Keeping data format intact after converting pdf file into txt
Datum: 13.11.2018, 11:44 Uhr
An: users@pdfbox.apache.org





Hi :) ,
I have been converting many pdf files into txt files, thanks to Apache pdf
box.
However, I have recently come across a pdf file which after converting into
txt file does not retain the format that was in pdf file. The data is
completely disintegrated from the tables and all the data appear 
vertically.
Is there any way i can retain the format and tables. Any help on this would
be much appreciated.

Thanks & Regards,
Vinayaka

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message