pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From CDB <cbu...@burkeitconsulting.com>
Subject COSString Selection
Date Fri, 08 Jun 2012 17:05:06 GMT
I am having issues using the getText method (ExtractText) functions as it
cats all text together.
I would like to go a step deeper and pull each COSString value and delimit
them.
Below is the code I am using thus far to get all text.

I am not 
        try {

 

           PDFTextStripper pdfTextStripper = new PDFTextStripper();

            doc = PDDocument.load( stream );



          return (pdfTextStripper.getText(doc));

          



        } finally {

            quietlyClose(doc);

        }


I noticed that the the logs show the operators and types.  But some strings
are broken up into multiple COSString fields within arrays.

I would like to know what methods can I use to traverse/look all fields and
select the COStrings out.

Thanks



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message