From commits-return-13308-archive-asf-public=cust-asf.ponee.io@pdfbox.apache.org Sat Oct 20 10:12:45 2018
Return-Path:
A page in a PDF document is represented with a COSDictionary. The entries that are available for a page can be seen in the PDF Reference and an example of a page looks like this:
-<<
+<<
/Type /Page
/MediaBox [0 0 612 915]
/Contents 56 0 R
>>
-
-
+
The information within the dictionary can be accessed using the COS model
-COSDictionary page = ...;
+COSDictionary page = ...;
COSArray mediaBox = (COSArray)page.getDictionaryObject( "MediaBox" );
System.out.println( "Width:" + mediaBox.get( 3 ) );
-
-
+
As can be seen from that little example the COS model provides a low level API to access information within the PDF. In order to use the COS model successfully a good knowledge of @@ -305,11 +303,10 @@ available to access the attributes.
The same code from above to get the page width can be rewritten to use PD Model classes.
-PDPage page = ...;
+PDPage page = ...;
PDRectangle mediaBox = page.getMediaBox();
System.out.println( "Width:" + mediaBox.getWidth() );
-
-
+
PD Model objects sit on top of COS model. Typically, the classes in the PD Model will only store a COS object and all setter/getter methods will modify data that is stored in the http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/1d572c9f/content/1.8/commandline.html ---------------------------------------------------------------------- diff --git a/content/1.8/commandline.html b/content/1.8/commandline.html index 201b214..56f2ad7 100644 --- a/content/1.8/commandline.html +++ b/content/1.8/commandline.html @@ -110,7 +110,7 @@
This small sample shows how to create a new PDF document using PDFBox.
-// Create a new empty document
+// Create a new empty document
PDDocument document = new PDDocument();
// Create a new blank page and add it to the document
@@ -176,14 +176,13 @@
// finally make sure that the document is properly
// closed.
document.close();
-
-
+
This small sample shows how to create a new document and print the text “Hello World” using one of the PDF base fonts.
-// Create a document and add a page to it
+// Create a document and add a page to it
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage( page );
@@ -207,8 +206,7 @@
// Save the results and ensure that the document is properly closed:
document.save( "Hello World.pdf");
document.close();
-
-
+
This small sample shows how to encrypt a file so that it can be viewed, but not printed.
-PDDocument doc = PDDocument.load("filename.pdf");
+PDDocument doc = PDDocument.load("filename.pdf");
// Define the length of the encryption key.
// Possible values are 40 or 128 (256 will be available in PDFBox 2.0).
@@ -185,8 +185,7 @@
doc.save("filename-encrypted.pdf");
doc.close();
-
-
+
Load the PDF document.
-:::java
+:::java
// load the document
PDDocument pdfDocument = PDDocument.loadNonSeq(new File(... ), null);
-
-
+
Get the docoument catalog and the AcroForm which might be contained within.
-:::java
+:::java
// get the document catalog
PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
-
-
+
Retrieve an individual field and set its value.
-:::java
+:::java
// as there might not be an AcroForm entry a null check is necessary
if (acroForm != null)
{
PDField field = (PDField) acroForm.getField( "fieldName" );
field.setValue("new field value");
}
-
-
+
If a field is nested within the form tree a fully qualified name might be provided to access the field.
-:::java
+:::java
// as there might not be an AcroForm entry a null check is neccessary
if (acroForm != null)
{
PDField field = (PDField) acroForm.getField( "fieldsParentName.fieldName" );
field.setValue("new field value");
}
-
-
+
Save and close the filled out form.
-:::java
+:::java
doc.save(filledForm);
doc.close();
-
-
+
The PDF/A specification enforces that the fonts used in the document are present in the PDF File. You have to load them. As an example:
-InputStream fontStream = CreatePDFA.class.getResourceAsStream("/org/apache/pdfbox/resources/ttf/ArialMT.ttf");
+InputStream fontStream = CreatePDFA.class.getResourceAsStream("/org/apache/pdfbox/resources/ttf/ArialMT.ttf");
PDFont font = PDTrueTypeFont.loadTTF(doc, fontStream);
-
-
+
XMPMetadata xmp = new XMPMetadata();
+XMPMetadata xmp = new XMPMetadata();
XMPSchemaPDFAId pdfaid = new XMPSchemaPDFAId(xmp);
xmp.addSchema(pdfaid);
pdfaid.setConformance("B");
pdfaid.setPart(1);
pdfaid.setAbout("");
metadata.importXMPMetadata(xmp);
-
-
+
It is mandatory to include the color profile used by the document. Different profiles can be used. This example takes one present in pdfbox:
-// Create output intent
+// Create output intent
InputStream colorProfile = CreatePDFA.class.getResourceAsStream("/org/apache/pdfbox/resources/pdfa/sRGB Color Space Profile.icm");
PDOutputIntent oi = new PDOutputIntent(doc, colorProfile);
oi.setInfo("sRGB IEC61966-2.1");
@@ -204,16 +202,14 @@ example takes one present in pdfbox:
oi.setOutputConditionIdentifier("sRGB IEC61966-2.1");
oi.setRegistryName("http://www.color.org");
cat.addOutputIntent(oi);
-
-
+
The complete example can be found in pdfbox-example. The source file is
-src/main/java/org/apache/pdfbox/examples/pdfa/CreatePDFA.java
-
-src/main/java/org/apache/pdfbox/examples/pdfa/CreatePDFA.java
+
This small sample shows how to check the compliance of a file with the PDF/A-1b specification.
-ValidationResult result = null;
+ValidationResult result = null;
PreflightParser parser = new PreflightParser(args[0]);
try
@@ -211,8 +211,7 @@ Check Compliance with PDF/A-1b
System.out.println(error.getErrorCode() + " : " + error.getDetails());
}
}
-
-
+
This small sample shows how to render (convert to images) a PDF document using PDFBox.
-:::java
+:::java
String filename = "YOURFILENAMEHERE.pdf";
// open the document
@@ -197,8 +197,7 @@
}
doc.close();
-
-
+
Document luceneDocument = LucenePDFDocument.getDocument( ... );
-
-Document luceneDocument = LucenePDFDocument.getDocument( ... );
+
Now that you hava a Lucene Document object, you can add it to the Lucene index just like you would if it had been created from a text or HTML file. The LucenePDFDocument automatically @@ -201,12 +200,11 @@ process. The simplest is to specify the range of pages that you want to be extra For example, to only extract text from the second and third pages of the PDF document you could do this:
-PDFTextStripper stripper = new PDFTextStripper();
+PDFTextStripper stripper = new PDFTextStripper();
stripper.setStartPage( 2 );
stripper.setEndPage( 3 );
stripper.writeText( ... );
-
-
+
NOTE: The startPage and endPage properties of PDFTextStripper are 1 based and inclusive.
http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/1d572c9f/content/1.8/cookbook/workingwithattachments.html ---------------------------------------------------------------------- diff --git a/content/1.8/cookbook/workingwithattachments.html b/content/1.8/cookbook/workingwithattachments.html index 13569ec..cc59568 100644 --- a/content/1.8/cookbook/workingwithattachments.html +++ b/content/1.8/cookbook/workingwithattachments.html @@ -110,7 +110,7 @@PDComplexFileSpecification
-PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
+PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
//first create the file specification, which holds the embedded file
PDComplexFileSpecification fs = new PDComplexFileSpecification();
@@ -204,8 +204,7 @@ Attachments are part of the named tree that is attached to the document catalog.
PDDocumentNameDictionary names = new PDDocumentNameDictionary( doc.getDocumentCatalog() );
names.setEmbeddedFiles( efTree );
doc.getDocumentCatalog().setNames( names );
-
-
+
http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/1d572c9f/content/1.8/cookbook/workingwithfonts.html
----------------------------------------------------------------------
diff --git a/content/1.8/cookbook/workingwithfonts.html b/content/1.8/cookbook/workingwithfonts.html
index bed4e4e..38859a1 100644
--- a/content/1.8/cookbook/workingwithfonts.html
+++ b/content/1.8/cookbook/workingwithfonts.html
@@ -110,7 +110,7 @@
This small sample shows how to create a new document and print the text “Hello World” using one of the PDF base fonts.
-// Create a document and add a page to it
+// Create a document and add a page to it
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage( page );
@@ -258,14 +258,13 @@
// Save the results and ensure that the document is properly closed:
document.save( "Hello World.pdf");
document.close();
-
-
+
This small sample shows how to create a new document and print the text “Hello World” using a TrueType font.
-// Create a document and add a page to it
+// Create a document and add a page to it
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage( page );
@@ -289,8 +288,7 @@
// Save the results and ensure that the document is properly closed:
document.save( "Hello World.pdf");
document.close();
-
-
+
While it is recommended to embed all fonts for greatest portability not all PDF producer applications will do this. When displaying a PDF it is necessary to find an external font to use. @@ -304,7 +302,7 @@ use when no mapping exists.
This small sample shows how to create a new document and print the text “Hello World” using a PostScript Type1 font.
-// Create a document and add a page to it
+// Create a document and add a page to it
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage( page );
@@ -328,8 +326,7 @@ use when no mapping exists.
// Save the results and ensure that the document is properly closed:
document.save( "Hello World.pdf");
document.close();
-
-
+
To set or retrieve basic information about the document the PDDocumentInformation object provides a high level API to that information:
-PDDocumentInformation info = document.getDocumentInformation();
+PDDocumentInformation info = document.getDocumentInformation();
System.out.println( "Page Count=" + document.getNumberOfPages() );
System.out.println( "Title=" + info.getTitle() );
System.out.println( "Author=" + info.getAuthor() );
@@ -182,8 +182,7 @@ provides a high level API to that information:
System.out.println( "Creation Date=" + info.getCreationDate() );
System.out.println( "Modification Date=" + info.getModificationDate());
System.out.println( "Trapped=" + info.getTrapped() );
-
-
+
PDF documents can have XML metadata associated with certain objects within a PDF document. For example, the following PD Model objects have the ability to contain metadata:
-PDDocumentCatalog
+PDDocumentCatalog
PDPage
PDXObject
PDICCBased
PDStream
-
-
+
The metadata that is stored in PDF objects conforms to the XMP specification, it is recommended that you review that specification. Currently there is no high level API for managing the XML metadata, PDFBox uses standard java InputStream/OutputStream to retrieve or set the XML metadata.
-PDDocument doc = PDDocument.load( ... );
+PDDocument doc = PDDocument.load( ... );
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDMetadata metadata = catalog.getMetadata();
@@ -218,8 +216,7 @@ or set the XML metadata.
InputStream newXMPData = ...;
PDMetadata newMetadata = new PDMetadata(doc, newXMLData, false );
catalog.setMetadata( newMetadata );
-
-
+
To add the pdfbox, fontbox, jempbox and commons-logging jars to your application, the easiest thing is to declare the Maven dependency shown below. This gives you the main pdfbox library directly and the other required jars as transitive dependencies.
-<dependency>
+<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>...</version>
</dependency>
-
-
+
Set the version field to the latest stable PDFBox version.
@@ -219,7 +218,7 @@ pdfbox library directly and the other required jars as transitive dependencies.<The most notable such optional feature is support for PDF encryption. Instead of implementing its own encryption algorithms, PDFBox uses libraries from the Legion of the Bouncy Castle. Both the bcprov and bcmail libraries are needed and can be included using the Maven dependencies shown below.
-<dependency>
+<dependency>
<groupId>org.bouncycastle</groupId>
<artifactId>bcprov-jdk15</artifactId>
<version>1.44</version>
@@ -229,21 +228,19 @@ pdfbox library directly and the other required jars as transitive dependencies.<
<artifactId>bcmail-jdk15</artifactId>
<version>1.44</version>
</dependency>
-
-
+
Another important optional feature is support for bidirectional languages like Arabic. PDFBox uses the ICU4J library from the International Components for Unicode (ICU) project to support such languages in PDF documents. To add the ICU4J jar to your project, use the following Maven dependency.
-<dependency>
+<dependency>
<groupId>com.ibm.icu</groupId>
<artifactId>icu4j</artifactId>
<version>3.8</version>
</dependency>
-
-
+
PDFBox also contains extra support for use with the Lucene and Ant projects. Since in these cases PDFBox is just an add-on feature to these projects, you should first set up your application to use Lucene or Ant and then add PDFBox support as described on this page.
http://git-wip-us.apache.org/repos/asf/pdfbox-docs/blob/1d572c9f/content/1.8/faq.html ---------------------------------------------------------------------- diff --git a/content/1.8/faq.html b/content/1.8/faq.html index 061e907..daf35f4 100644 --- a/content/1.8/faq.html +++ b/content/1.8/faq.html @@ -110,7 +110,7 @@log4j:WARN No appenders could be found for logger (org.apache.pdfbox.util.ResourceLoader).
+log4j:WARN No appenders could be found for logger (org.apache.pdfbox.util.ResourceLoader).
log4j:WARN Please initialize the log4j system properly.
-
-
+
This message means that you need to configure the log4j logging system. See the log4j documentation for more information.
PDFBox comes with a sample log4j configuration file. To use it you set a system property like this
-java -Dlog4j.configuration=log4j.xml org.apache.pdfbox.ExtractText <PDF-file> <output-text-file>
-
-java -Dlog4j.configuration=log4j.xml org.apache.pdfbox.ExtractText <PDF-file> <output-text-file>
+
If this is not working for you then you may have to specify the log4j config file using a URL path, like this:
-log4j.configuration=file:///<path to config file>
-
-log4j.configuration=file:///<path to config file>
+
PDDocument doc = new PDDocument();
+PDDocument doc = new PDDocument();
try
{
doc = PDDocument.load( "my.pdf" );
@@ -228,8 +225,7 @@ PDDocument objects; one from the “new PDDocument()” and the second by the lo
doc.close();
}
}
-
-
+
This small sample shows how to encrypt a file so that it can be viewed, but not printed.
-PDDocument doc = PDDocument.load(new File("filename.pdf"));
+PDDocument doc = PDDocument.load(new File("filename.pdf"));
// Define the length of the encryption key.
// Possible values are 40, 128 or 256.
@@ -131,8 +131,7 @@
doc.save("filename-encrypted.pdf");
doc.close();
-
-
+