cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miles Elam <>
Subject Re: [RT] Views for readers
Date Thu, 14 Aug 2003 16:51:43 GMT
Ummm...  Quick question:  What are the use cases for this that are not 
handled by existing methods?  I mean, couldn't this be handled with an 
(as-yet unwritten) action?

<map:match pattern="*.doc">
  <map:act type="catch-view">
    <map:parameter name="view-name" value="content"/>
    <map:generate type="word2xml" src="{../1}.doc"/>
    <!-- complete the pipeline -->
  <map:read src="{1}.doc"/>

Jeff mentioned getting metainformation from binary data for searching, 
but surely there are so many different types of binary data, a universal 
view seems rather heavy-handed.  It works for search queries (barely, in 
my opinion).  For content manipulation clients (like WebDAV), these 
clients can't pass the query string trigger for views.  This seems to me 
to be a one-trick pony.  To make views available for readers, it seems 
as though specificity is lost.

The point of XML was specifically structured content, yes?  Any 
conformant parser should be able to read any conformant file.  Binary 
content has no such constraint.  If both a reader and a generator are 
required in a matcher, I think some type of syntax that separates the 
two *visually* (not just conceptually) is necessary as a cue.

Putting in binary options makes all content one step worse than your 
typical HTML web page: lack of intelligent structure without hope of 
enforcing a schema.  Generators that read from Word (and other similar 
formats) have taken some time to come to fruition precisely because of 
their arbitrary nature (varying character set assumptions, embedded OLE 
objects, various content encoding blocks, etc.).   Remember, XML (in 
this case as metadata) is just one representation of structure.  The 
important thing (in my opinion) is preserving the structure.  I don't 
see that happening with further intermingling of arbitrary binary data.

I guess I'm in the camp that's glad that readers exist.  Every time I 
have run into the dreaded error that comes from trying to load the 
output of a reader into the generator of another matcher, I have found a 
sitemap organization error.  I guess I'm seeing the Cocoon version of 
"goto considered harmful."  Sure it's flexible.  Sure it's powerful.  
But will it impart more complexity and discomfort than it solves in 
actual practice?

Hacking the view internals seems overkill (emphasis on kill).  Inline 
with resource reader's role as "arbitrary, unorganized bit bucket with a 
MIME type," there is no universal way of delivering appropriate 
content.  The method of getting content from a Word document is very 
different from the method of content gathering from a PDF document.  
Views, orthogonal access to similar resources (ie. XML resources), 
doesn't apply.  "View source" on a text file is straightforward.  "View 
source" on an XML file even more so.  What is "View source" on reader 
content?  You would have to assign a different view to each class of 
reader or put in some MIME type matching hack.  Neither is less work or 
easier to grok than simply putting in an action or selector in the 
appropriate matchers I think.

If this type of thing moves forward, I would rather see more specificity 
going into readers than twiddling with what comes out: a PDF reader, a 
Word reader, a Postscript reader, etc.  In that case you're separating 
out by schema, by at least some form of contract.  The alternative is 
equivalent to saying, "let's just make one class of transformer because 
all XML is alike and only three transformation options are available 

- Miles Elam

P.S. Sorry to start trouble, but I think someone had to mention it.

View raw message