cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Graham <Ryan.Gra...@apollogrp.edu>
Subject RE: Transform PDF to XML/XHTML
Date Mon, 03 Nov 2003 16:37:31 GMT

>> I need to transform a PDF file to XML (XHTML) format.
>> I saw an example in Cocoon of doing the opposite, i.e.
>> XML->PDF using XSL-FO.
>
>There probably is a way to do this....but it's a bit involved.
>
>There is a commercial software package available that will 
>convert a PDF back 
>into a Word document.  I don't remember who sells it....ping 
>me privately later 
>(when I am back in the office) and I'll tell you were to find 
>it.  It's about 
>$50.

There is a tool by CambridgeDocs called Xdoc Converter.  It can take a PDF
and transform it to any flavor of XML (based on rules that you set up).
>From there it can export it to a Word Doc, HTML, another PDF, etc.  The
price tag on this one is a bit hefty though, and there is a substantial
learning curve for the software.

>You could use this tool to get into Word .doc format, then use Word or 
>something similar to convert this .doc into RTF (older Word 
>versions) or XML 
>(Office 2003)....then you have clear text that you can process 
>into XHTML.
>
>Ugly...and would take a while to put in place, but doable.

Agreed -- somewhat of a time-consuming process.

HTH,
RG

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Mime
View raw message