xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Dierken <m...@DataChannel.com>
Subject RE: Cocoon the other way???
Date Fri, 02 Jun 2000 17:42:24 GMT
Well, there are two issues:
 - getting to the data with an API & data model
 - getting to the knowledge

As others have said, you can create an 'XML view' of whatever you want -
sometimes it's easy, sometimes it's hard & sometimes it might even perform
well. Trying to map the native data model into the limited XML data model
works for many/some things (depending on your philosophy) - but writeback
will be a pain.

The 'data view' aspect has some grounding in something called Groves. Groves
have a model where you declare the data types and their properties in a
'property set' - this is the 'all you can ever possibly get to' description
of what a Word document or PDF document is. You can also specify a sub-set
of this - a 'grove plan' - of what you want to deal with in some processing.
Then you build (or borrow) a 'grove constructor' that is aware of the native
data - this is the plug-in/adaptor/connector part. You know have a
consistent object model to talk to the native data. This object model is
more generalized than the XML object model. Essentially it is a graph of
objects that have named properties.

The second part - getting to the knowledge - is still a pain. But with some
'uniform view' (XML, DOM, Groves, etc) you can create a hand-crafted
processor to attempt to extract some meaning. If the plug-in that expresses
the native data in the a uniform view is aware of the semantics of the data
(which it probably is), it could express the semantic info in a way that is
easy to pick up in a post-processor.


see-also: http://www.oasis-open.org/cover/groves.html
see-also: http://www.prescod.net/groves/shorttut/

> -----Original Message-----
> From: Samuel Kock [mailto:skock@cs.up.ac.za]
> Sent: Monday, May 29, 2000 6:11 AM
> To: General
> Subject: Cocoon the other way???
> Hi
> I am quite new to this list, but hace been following the goings on for
> some time. Most things You people talk about are a bit greek 
> to me, but
> I do have one question:
> At the moment, as I understand it, Cocoon is used to go from XML to
> HTML, Word files, PDF, etc... Am I right?
> Is it possible to maybe write an extension to cocoon that would go the
> other way? For example, convert a PDF file to an XML file using (I
> suppose) XSLT? Or a RTF or HTML file, for that matter????
> I am very interested in thiss, since my MAsters thesis is about this,
> and if I can use the basic coccoon foundation as a starting 
> point, just
> concentrating on my bits would be very easier...
> Anybody has any ideas/thoughts?
> Regards
> Samuel Kock
> University of PRetoria
> ---------------------------------------------------------------------
> In case of troubles, e-mail:     webmaster@xml.apache.org
> To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org

View raw message