Return-Path: Delivered-To: apmail-cocoon-users-archive@www.apache.org Received: (qmail 91947 invoked from network); 3 Nov 2003 16:39:58 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 3 Nov 2003 16:39:58 -0000 Received: (qmail 95543 invoked by uid 500); 3 Nov 2003 16:39:43 -0000 Delivered-To: apmail-cocoon-users-archive@cocoon.apache.org Received: (qmail 95358 invoked by uid 500); 3 Nov 2003 16:39:42 -0000 Mailing-List: contact users-help@cocoon.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: Reply-To: users@cocoon.apache.org Delivered-To: mailing list users@cocoon.apache.org Received: (qmail 95344 invoked from network); 3 Nov 2003 16:39:41 -0000 Received: from unknown (HELO ob1.apollogrp.edu) (204.17.16.253) by daedalus.apache.org with SMTP; 3 Nov 2003 16:39:41 -0000 Received: by ob1.apollogrp.edu with Internet Mail Service (5.5.2655.55) id ; Mon, 3 Nov 2003 09:37:38 -0700 Message-ID: <670193290CCBD511BA6600B0D079B7FD02F3AE75@usvphxex1.apollogrp.edu> From: Ryan Graham To: "'users@cocoon.apache.org'" Subject: RE: Transform PDF to XML/XHTML Date: Mon, 3 Nov 2003 09:37:31 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2655.55) Content-Type: text/plain X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N >> I need to transform a PDF file to XML (XHTML) format. >> I saw an example in Cocoon of doing the opposite, i.e. >> XML->PDF using XSL-FO. > >There probably is a way to do this....but it's a bit involved. > >There is a commercial software package available that will >convert a PDF back >into a Word document. I don't remember who sells it....ping >me privately later >(when I am back in the office) and I'll tell you were to find >it. It's about >$50. There is a tool by CambridgeDocs called Xdoc Converter. It can take a PDF and transform it to any flavor of XML (based on rules that you set up). >From there it can export it to a Word Doc, HTML, another PDF, etc. The price tag on this one is a bit hefty though, and there is a substantial learning curve for the software. >You could use this tool to get into Word .doc format, then use Word or >something similar to convert this .doc into RTF (older Word >versions) or XML >(Office 2003)....then you have clear text that you can process >into XHTML. > >Ugly...and would take a while to put in place, but doable. Agreed -- somewhat of a time-consuming process. HTH, RG --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org For additional commands, e-mail: users-help@cocoon.apache.org