xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: Cocoon the other way???
Date Mon, 29 May 2000 13:28:53 GMT
Samuel Kock wrote:
> 
> Hi
> 
> I am quite new to this list, but hace been following the goings on for
> some time. Most things You people talk about are a bit greek to me, but
> I do have one question:
> 
> At the moment, as I understand it, Cocoon is used to go from XML to
> HTML, Word files, PDF, etc... Am I right?

Almost. Word is not supported, nor it's planned to be right now.
 
> Is it possible to maybe write an extension to cocoon that would go the
> other way? For example, convert a PDF file to an XML file using (I
> suppose) XSLT? Or a RTF or HTML file, for that matter????

This discussion comes out once and a while.

XSLT cannot work on non-XML content. You cannot process PDF with XSLT.

You could, in theory, create a PDF XML adaptation, by creating an XML
schema that mimics PDF. At that point, you could XSLT-process it, but
this would _not_ make any difference from processing _by_hand_ your
original PDF file.
 
> I am very interested in thiss, since my MAsters thesis is about this,
> and if I can use the basic coccoon foundation as a starting point, just
> concentrating on my bits would be very easier...

Cocoon is not likely to be useful for you if you plan to go the other
direction.
 
> Anybody has any ideas/thoughts?

Doing "anything" -> XML translation doesn't make sense. It's like
saying, converting Word documents to UNICODE. Or russian Word documents
to ASCII. Plain nonsense.

XML is a syntax, not a language. Every language has it's own syntax, but
some of them share a syntax. Just like some file formats are binary and
others are text.

So you want to transform the syntax of the language, or extract
information from one language to another? If the second, going from PDF,
say, to DocBook (and XML language for technical book writing) is not
different from doing OCR over faxes to get characters based on their
shape.

Possible, but euristic.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------



Mime
View raw message