pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkuehler <andr...@lehmi.de>
Subject Re: Question re PDFTextStripper
Date Sun, 02 Jan 2011 19:03:23 GMT
Hi,

Am 01.01.2011 03:18, schrieb Alan Thomas:
>                I am trying to understand the PDFTextStripper class.
>
>
>
>                Here is where I get lost: Its processPage method calls
> PDFStreamEngine`s processStream method, which calls the processSubStream
> method of PDFStreamEngine.  The processSubStream method calls the
> processOperator method, which uses the process method (among others) of the
> OperatorProcessor class.  However, the OperatorProcessor class is abstract,
> and the process method is defined as an abstract class.  I cannot find where
> this abstract class is subclassed.
>
>
>
>                Can anyone point me in the right direction?
You can find all supported operators within the packages

org.apache.pdfbox.util.operator
org.apache.pdfbox.util.operator.pagedrawer (only needed for rendering)

The property file PDFTextStripper.properties [1] lists all operators which are 
needed for text extraction and PageDrawer.properties [2] all which are needed 
for rendering.

BR
Andreas Lehmkühler

[1] 
http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/resources/org/apache/pdfbox/resources/PDFTextStripper.properties
[2] 
http://svn.apache.org/repos/asf/pdfbox/trunk/pdfbox/src/main/resources/org/apache/pdfbox/resources/PageDrawer.properties

Mime
View raw message