pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From CDB <cbu...@burkeitconsulting.com>
Subject COSString Selection
Date Fri, 08 Jun 2012 17:05:06 GMT
I am having issues using the getText method (ExtractText) functions as it
cats all text together.
I would like to go a step deeper and pull each COSString value and delimit
Below is the code I am using thus far to get all text.

I am not 
        try {


           PDFTextStripper pdfTextStripper = new PDFTextStripper();

            doc = PDDocument.load( stream );

          return (pdfTextStripper.getText(doc));


        } finally {



I noticed that the the logs show the operators and types.  But some strings
are broken up into multiple COSString fields within arrays.

I would like to know what methods can I use to traverse/look all fields and
select the COStrings out.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message