pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tilman Hausherr (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-4482) True Type vs Embedded Text Output
Date Wed, 06 Mar 2019 07:56:00 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785385#comment-16785385
] 

Tilman Hausherr commented on PDFBOX-4482:
-----------------------------------------

Please attach your PDF and describe what you are doing, i.e. include your java code or mention
what utility you are using with parameters. Currently it reads like you are converting HTML
to PDF and back to HTML ???

> True Type vs Embedded Text Output
> ---------------------------------
>
>                 Key: PDFBOX-4482
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4482
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.14
>         Environment: Windows or Linux
>            Reporter: Stev Dempsey
>            Priority: Trivial
>
> Kinda difficult to describe but here goes
> We use tinymce editor and then process that html document through M$Word to create a
PDF. All is good there. Once we have the PDF we need to send the info to another system that
only recognizes text. We need to preserve the vertical spacing between parts of the document.
> If Arial font is used all works well.
> In Times font is used the <P> Paragraphs are messed up.
> Source HTML for Times is here:
> <p><span style="font-family: times new roman,times; font-size: 10pt;">COMPARISON:
None</span></p>
> <p><span style="font-family: times new roman,times; font-size: 10pt;">TECHNIQUE:
Axial CT images obtained of the spine with sagittal and coronal reconstructions.  This is
the technique section.  This is the technique section.  This is the technique section. 
This is the technique section.  This is the technique section.  This is the technique section. 
This is the technique section.  This is the technique section.  This is the technique section. 
This is the technique section.  This is the technique section.</span></p>
> <p><span style="font-family: times new roman,times; font-size: 10pt;">FINDINGS:
No acute fracture, dislocation or abnormal lesion is shown. There is loss of the cervical
lordosis. Osteophyte is noted at C4-5, and C5-6 levels with moderate disc space narrowing.
The spinal cord is normal. No Chiari malformation. No extradural soft tissue masses or paraspinal
soft tissue masses. Upper thoracic spine is normal.  This is the findings section.  This
is the findings section.  This is the findings section.  This is the findings section. 
This is the findings section.  This is the findings section.  This is the findings section. 
This is the findings section.  This is the findings section.  This is the findings section. 
The previous lines are all one paragraph and should not have any breaks.  The next several
lines are one line (one paragraph) per spine section.</span></p>
> <p><span style="font-family: times new roman,times; font-size: 10pt;">C2-3:
No disc herniation, central stenosis or neural foraminal stenosis.</span></p>
> <p><span style="font-family: times new roman,times; font-size: 10pt;">C3-4:
Shallow central and right paracentral disc herniation. No central stenosis or neural foraminal
stenosis.</span></p>
> <p><span style="font-family: times new roman,times; font-size: 10pt;">C4-5:
Diffuse disc bulge and right posterior lateral osteophyte with uncovertebral joint hypertrophy
and moderate bilateral neural foraminal stenosis. No central stenosis.</span></p>
> <p><span style="font-family: times new roman,times; font-size: 10pt;">C5-6:
Shallow central and right posterior lateral disc herniation with mass-effect along the cervical
cord, right lateral recess stenosis and bilateral neural foraminal stenosis, moderate on the
left and mild on the right. No central stenosis.</span></p>
> <p><span style="font-family: times new roman,times; font-size: 10pt;">C6-7:
No disc herniation, central stenosis or neural foraminal stenosis.</span></p>
> <p><span style="font-family: times new roman,times; font-size: 10pt;">C7-T1:
No disc herniation, central stenosis or neural foraminal stenosis.</span></p>
> <p><span style="font-family: times new roman,times; font-size: 10pt;"><span
style="font-family: times new roman,times; font-size: 10pt;">IMPRESSION: </span><br
/><span style="font-family: times new roman,times; font-size: 10pt;">1. Moderate
degenerative disc disease with loss of cervical lordosis.  This is line 1 of the impression.</span><br
/><br /><span style="font-family: times new roman,times; font-size: 10pt;">2.
C3-4 level with shallow central and right paracentral disc herniation but no stenosis.  This
is line 2 of the impression.</span><br /><br /><span style="font-family:
times new roman,times; font-size: 10pt;">3. C4-5 level with diffuse disc bulge, right posterior
lateral osteophyte and uncovertebral joint hypertrophy with moderate bilateral neural foraminal
stenosis.  This is line 3 of the impression.</span><br /><br /><span
style="font-family: times new roman,times; font-size: 10pt;">4. C5-6 level with shallow
central and right posterior lateral disc herniation, right lateral recess stenosis and bilateral
neural foraminal stenosis, moderate on the left greater than right.  This is line 4 of the
impression.</span></span></p>
> :BREAK!
> this results in a decoded result that breaks the paragraphs <P> on each line instead
of keeping the whole paragraph intact and keeping the line breaks.
> :SUBPART!
> INFO  <p>TECHNIQUE: Axial CT images obtained of the spine with sagittal and coronal
reconstructions.  This is the technique 
> INFO  </p>
> INFO  <p>section.  This is the technique section.  This is the technique section. 
This is the technique section.  This is the technique 
> INFO  </p>
> INFO  <p>section.  This is the technique section.  This is the technique section. 
This is the technique section.  This is the technique 
> INFO  </p>
> INFO  <p>section.  This is the technique section.  This is the technique section.

> INFO  </p>
> INFO  <p>FINDINGS: No acute fracture, dislocation or abnormal lesion is shown.
There is loss of the cervical lordosis. Osteophyte is 
> INFO  </p>
> INFO  <p>noted at C4-5, and C5-6 levels with moderate disc space narrowing. The
spinal cord is normal. No Chiari malformation. No 
> INFO  </p>
> INFO  <p>extradural soft tissue masses or paraspinal soft tissue masses. Upper
thoracic spine is normal.  This is the findings 
> INFO  </p>
> INFO  <p>section.  This is the findings section.  This is the findings section. 
This is the findings section.  This is the findings 
> INFO  </p>
> INFO  <p>section.  This is the findings section.  This is the findings section. 
This is the findings section.  This is the findings 
> INFO  </p>
> INFO  <p>section.  This is the findings section.  The previous lines are all
one paragraph and should not have any breaks.  The next 
> INFO  </p>
> INFO  <p>several lines are one line (one paragraph) per spine section. 
> INFO  </p>
> INFO  <p>C2-3: No disc herniation, central stenosis or neural foraminal stenosis.

> INFO  </p>
>  
> :The original <P> is broken up line by line and not represented as a true paragraph.
Am I doing something wrong or is it the conversion?
> Any help appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Mime
View raw message