pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kay_Lee <heruri...@hotmail.com>
Subject Hello, I have a question in extracting Texts from PDF file.
Date Wed, 18 May 2016 02:21:00 GMT
I'm living in South Korea in Far-East Asia and I'm usinig Apache PDFBox in extracting Texts
from PDF files.
Name: Su-Sang, Lee (English name: Kay Lee)
Cell Phone: +82-10-3180-7976
Residence: Seoul, South Korea, Asia
E-mail: herurider@hotmail.com (or herurider@gmail.com)
My software development environment is,
Windows10, Visual Studio2015, C#, PDFBox version 1.1.1(Build of Apache PDFBOX library for
.NET binaries, available as Nuget pacakage.)
I can extract Texts (our Korean language) from PDF file with many thanks to Apache Foundation.
However, what I concern most is that PDFBox takes little bit longer time in extracting than
iTextSharp and other competitors.
What I need is only extracting Korean Text from PDF file and no more purposes.

I tried to research on internet like google and stackoverflow but no specific solution and
limited cases.

1) How can I extract text faster?
2) And do I need all the library wtih more than 30 MB files, if I only need to extract Texts
If I only need some specific dll library files among all PDFBOX dll library files, could you
please kindly let me know which ones ?

3) Is it still ok to use PDFBOX 1.1.1 ? There seems recent versions like 1.8.12 and 2.0.1.
I don't belong to any company and organization but just a private person and developing a
software to be distributed and used for free for 5 years as public profit purpose. As my major
is not software-related but just bio-chemistry, please understand kindly and explain me in
detail as possible as you'd be able.

My simple code to extract Text from PDF file is,

internal static string ExtractTextFromPdf(string path)
            PDDocument doc = null;
                doc = PDDocument.load(path);
                PDFTextStripper stripper = new PDFTextStripper();
                return stripper.getText(doc);
                if (doc != null)
Hope kind and excellent support.

Thank you so much !

Mr. Su-Sang, Lee (Kay Lee)
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message