pdfbox-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r935202 - in /websites/staging/pdfbox/trunk/content: ./ 1.8/faq.html
Date Mon, 05 Jan 2015 23:56:32 GMT
Author: buildbot
Date: Mon Jan  5 23:56:32 2015
New Revision: 935202

Log:
Staging update by buildbot for pdfbox

Modified:
    websites/staging/pdfbox/trunk/content/   (props changed)
    websites/staging/pdfbox/trunk/content/1.8/faq.html

Propchange: websites/staging/pdfbox/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Mon Jan  5 23:56:32 2015
@@ -1 +1 @@
-1649693
+1649694

Modified: websites/staging/pdfbox/trunk/content/1.8/faq.html
==============================================================================
--- websites/staging/pdfbox/trunk/content/1.8/faq.html (original)
+++ websites/staging/pdfbox/trunk/content/1.8/faq.html Mon Jan  5 23:56:32 2015
@@ -155,8 +155,8 @@
 <li><a href="#permission">Why do I get "You do not have permission to extract
text" on some documents?</a></li>
 <li><a href="#partially">Can't we just extract the text without parsing the whole
document or extract text as it is parsed?</a></li>
 </ul>
-<h2 id="general-questions_1">General Questions</h2>
-<h2 id="log4j">I am getting the below Log4J warning message, how do I remove it?</h2>
+<h2 id="answers-general-questions">Answers: General Questions</h2>
+<h3 id="log4j">I am getting the below Log4J warning message, how do I remove it?</h3>
 <div class="codehilite"><pre><span class="nl">log4j:</span><span
class="n">WARN</span> <span class="n">No</span> <span class="n">appenders</span>
<span class="n">could</span> <span class="n">be</span> <span class="n">found</span>
<span class="k">for</span> <span class="n">logger</span> <span
class="o">(</span><span class="n">org</span><span class="o">.</span><span
class="na">apache</span><span class="o">.</span><span class="na">pdfbox</span><span
class="o">.</span><span class="na">util</span><span class="o">.</span><span
class="na">ResourceLoader</span><span class="o">).</span>
 <span class="nl">log4j:</span><span class="n">WARN</span> <span
class="n">Please</span> <span class="n">initialize</span> <span class="n">the</span>
<span class="n">log4j</span> <span class="n">system</span> <span
class="n">properly</span><span class="o">.</span>
 </pre></div>
@@ -177,10 +177,10 @@ See the <a href="http://logging.apache.o
 
 <p>Please see <a href="https://sourceforge.net/forum/forum.php?thread_id=1254229&amp;amp;forum_id=267205">this</a>
forum thread 
 for more information.</p>
-<h2 id="threadsafe">Is PDFBox thread safe?</h2>
+<h3 id="threadsafe">Is PDFBox thread safe?</h3>
 <p>No! Only one thread may access a single document at a time. You can have multiple
threads
 each accessing their own PDDocument object.</p>
-<h2 id="notclosed">Why do I get a "Warning: You did not close the PDF Document"?</h2>
+<h3 id="notclosed">Why do I get a "Warning: You did not close the PDF Document"?</h3>
 <p>You need to call close() on the PDDocument inside the finally block, if you
 don't then the document will not be closed properly.  Also, you must close all
 PDDocument objects that get created.  The following code creates <strong>two</strong>
@@ -201,8 +201,8 @@ PDDocument objects; one from the "new PD
 </pre></div>
 
 
-<h3 id="text-extraction_1">Text Extraction</h3>
-<h2 id="notext">How come I am not getting any text from the PDF document?</h2>
+<h2 id="answers-text-extraction">Answers: Text Extraction</h2>
+<h3 id="notext">How come I am not getting any text from the PDF document?</h3>
 <p>Text extraction from a pdf document is a complicated task and there are many factors
 involved that effect the possibility and accuracy of text extraction.  It would be helpful
 to the PDFBox team if you could try a couple things.</p>
@@ -212,22 +212,22 @@ should be able to as well and it is a bu
 <li>It might really be an image instead of text.  Some PDF documents are just images
that have been scanned in.
 You can tell by using the selection tool in Acrobat, if you can't select any text then it
is probably an image.</li>
 </ul>
-<h2 id="gibberish">How come I am getting gibberish(G38G43G36G51G5) when extracting
text?</h2>
+<h3 id="gibberish">How come I am getting gibberish(G38G43G36G51G5) when extracting
text?</h3>
 <p>This is because the characters in a PDF document can use a custom encoding
 instead of unicode or ASCII.  When you see gibberish text then it
 probably means that a meaningless internal encoding is being used.  The
 only way to access the text is to use OCR.  This may be a future
 enhancement.</p>
-<h2 id="fontwidth">What does "java.io.IOException: Can't handle font width" mean?</h2>
+<h3 id="fontwidth">What does "java.io.IOException: Can't handle font width" mean?</h3>
 <p>This probably means that the "Resources" directory is not in your classpath. The
 Resources directory is included in the PDFBox jar so this is only a problem if you
 are building PDFBox yourself and not using the binary.</p>
-<h2 id="permission">Why do I get "You do not have permission to extract text" on some
documents?</h2>
+<h3 id="permission">Why do I get "You do not have permission to extract text" on some
documents?</h3>
 <p>PDF documents have certain security permissions that can be applied to them and
two 
 passwords associated with them, a user password and a master password. If the "cannot extract
text"
 permission bit is set then you need to decrypt the document with the master password in order
 to extract the text.</p>
-<h2 id="partially">Can't we just extract the text without parsing the whole document
or extract text as it is parsed?</h2>
+<h3 id="partially">Can't we just extract the text without parsing the whole document
or extract text as it is parsed?</h3>
 <p>Not really, for a couple reasons.</p>
 <ul>
 <li>If the document is encrypted then you need to parse at least until the encryption
dictionary before 



Mime
View raw message