pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang, Lisheng" <Lisheng.Zh...@BroadVision.com>
Subject RE: getText() performance in PDFBox 1.5 release
Date Fri, 04 Nov 2011 23:02:28 GMT
Thanks very much for pointing that out!!! 

I downloaded Tika 0.10 a few days ago and CHANGES.txt attached did
not mention PDFBox 1.6, based on that CHANGES.txt I thought Tika
used 1.4.

I will download PDFBox 1.6 and retest.

Best regards, Lisheng

-----Original Message-----
From: Andreas Lehmkuehler [mailto:andreas@lehmi.de]
Sent: Friday, November 04, 2011 3:40 PM
To: users@pdfbox.apache.org
Subject: Re: getText() performance in PDFBox 1.5 release


Hi,

Am 04.11.2011 20:34, schrieb Zhang, Lisheng:
> Hi Mike,
>
> Thanks very much, I tested and result is the same, from source code
> it seems that suppressDuplicateOverlappingText parameter does not
> have effect if I call PDFTextStripper.getText(..) directly. I will
> check more to see if I can use method processEncodedText(..).
>
> Which version of PDFBox did you use (Tika has not used PDFBox 1.5 yet)?
According to [1] Tika 0.10 uses PDFBox 1.6. which includes some improvements 
related to performance.

> Best regards, Lisheng
> <SNIP>


BR
Andreas Lehmkühler
[1] http://www.apache.org/dist/tika/CHANGES-0.10.txt

Mime
View raw message