pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Corrupted words when using PDFTextStripper
Date Mon, 09 Jun 2014 10:18:30 GMT
This could be a OCRed file. Try copy & paste from acrobat reader to see 
whether you get the same result.

Tilman

Am 09.06.2014 11:55, schrieb Walter Kehl:
> Hi,
>
>   
>
> I am new to the list so I don't know whether this has been asked before:
>
>   
>
> I am using PDFTextStripper (embedded into another application) to get the
> raw text of PDFs so far with good results but recently a PDF file has
> appeared where the output of the PDFTextStripper was corrupted. I got
> sentences like:
>
>   
>
> "There is al o con ern that b nkers may be pushed to misprice risk (No. 6)
> by the pres ures of c mpetition and an abunda ce of central b nk-provided
> liquidity."
>
>   
>
> where characters seem to be missing. Does anyone have any idea what went
> wrong here and how could I prevent it?
>
>   
>
>   
>
>   
>
> Thanks for your help
>
>   
>
> Walter Kehl
>
>   
>
>   
>
>   
>
>


Mime
View raw message