pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amir H. Jadidinejad" <amir.jad...@yahoo.com.INVALID>
Subject Problem with mixed RTL/LTR pdfs
Date Sat, 02 Aug 2014 20:45:00 GMT
I can extract the content of a monolingual PDF files using the following code:
        PDFTextStripper stripper = new PDFTextStripper();
        PDDocument doc = PDDocument.load(file);
        String txt = stripper.getText(doc);

It's perfect when the input document is monolingual.

The problem is that when the input document is a combination of right-to-left and left-to-right
languages, the output characters of one language is reversed!

A sample bilingual pdf document is attached.

Would you please help me in this issue?

View raw message