cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <>
Subject Re: [RT] Views for readers
Date Thu, 14 Aug 2003 17:24:48 GMT
Miles Elam wrote:

> Ummm...  Quick question:  What are the use cases for this that are not 
> handled by existing methods?  I mean, couldn't this be handled with an 
> (as-yet unwritten) action?
> <map:match pattern="*.doc">
>  <map:act type="catch-view">
>    <map:parameter name="view-name" value="content"/>
>    <map:generate type="word2xml" src="{../1}.doc"/>
>    <!-- complete the pipeline -->
>  </map:act>
>  <map:read src="{1}.doc"/>
> </map:match>

Go back to first post of this thread, where (last paragraph) I proposed 
something similar. The whole discussion is about how we could have a 
syntax which doesn't introduce such verbosity in the sitemap.

> Jeff mentioned getting metainformation from binary data for searching, 
> but surely there are so many different types of binary data, a 
> universal view seems rather heavy-handed.  It works for search queries 
> (barely, in my opinion).  For content manipulation clients (like 
> WebDAV), these clients can't pass the query string trigger for views.  
> This seems to me to be a one-trick pony.  To make views available for 
> readers, it seems as though specificity is lost.
> The point of XML was specifically structured content, yes?  Any 
> conformant parser should be able to read any conformant file.  Binary 
> content has no such constraint.  If both a reader and a generator are 
> required in a matcher, I think some type of syntax that separates the 
> two *visually* (not just conceptually) is necessary as a cue.
> Putting in binary options makes all content one step worse than your 
> typical HTML web page: lack of intelligent structure without hope of 
> enforcing a schema.  Generators that read from Word (and other similar 
> formats) have taken some time to come to fruition precisely because of 
> their arbitrary nature (varying character set assumptions, embedded 
> OLE objects, various content encoding blocks, etc.).   Remember, XML 
> (in this case as metadata) is just one representation of structure.  
> The important thing (in my opinion) is preserving the structure.  I 
> don't see that happening with further intermingling of arbitrary 
> binary data.
> I guess I'm in the camp that's glad that readers exist.  Every time I 
> have run into the dreaded error that comes from trying to load the 
> output of a reader into the generator of another matcher, I have found 
> a sitemap organization error.  I guess I'm seeing the Cocoon version 
> of "goto considered harmful."  Sure it's flexible.  Sure it's 
> powerful.  But will it impart more complexity and discomfort than it 
> solves in actual practice?
> Hacking the view internals seems overkill (emphasis on kill).  Inline 
> with resource reader's role as "arbitrary, unorganized bit bucket with 
> a MIME type," there is no universal way of delivering appropriate 
> content.  The method of getting content from a Word document is very 
> different from the method of content gathering from a PDF document.  
> Views, orthogonal access to similar resources (ie. XML resources), 
> doesn't apply.  "View source" on a text file is straightforward.  
> "View source" on an XML file even more so.  What is "View source" on 
> reader content?  You would have to assign a different view to each 
> class of reader or put in some MIME type matching hack.  Neither is 
> less work or easier to grok than simply putting in an action or 
> selector in the appropriate matchers I think.
> If this type of thing moves forward, I would rather see more 
> specificity going into readers than twiddling with what comes out: a 
> PDF reader, a Word reader, a Postscript reader, etc.  In that case 
> you're separating out by schema, by at least some form of contract.  
> The alternative is equivalent to saying, "let's just make one class of 
> transformer because all XML is alike and only three transformation 
> options are available anyway."

As I explained in several replies, there's no equivalence between a 
reader and generator able to parse a given binary format. There needs to 
be some kind of adaptation/extraction before feeding the view.

And what you describe above as "a PDF reader, a Word reader, a 
Postscript reader, etc." are IMO nothing more than _generators_, just 
like the SWF and MIDI generators we already have.

Let's consider the MIDI example. Suppose we have a large collection of 
karaoke files (MIDI supports embedded text that can be played on screen 
while playing the music), and we want to index the text of these songs 
for easy retrieval (along with some other meta-data).

Here's a sitemap example, using the current syntax
<map:match pattern="*.mid"/>
  <map:act type="catch-view" src="content">
    <map:generate type="midi" src="{1}.mid"/>
    <map:transform src="xmidi2xdoc.xsl" label="content-label"/>
    <!-- should never come here -->
    <map:serialize type="xml"/>
  <map:read src="{1}.mid"/>

(the "content" view starts at the "content-label" label to clearly 
distinguish the two notions).

And the proposed shorter one :

<map:match pattern="*.mid">
  <map:read src="{1}.mid" unless-label="content"/>
  <map:generate type="midi" src="{1}.mid"/>
  <map:transform src="xmidi2xdoc.xsl" label="content-label"/>
  <!-- should never come here -->
  <map:serialize type="xml"/>

Note also that the "catch-view" action is not an easy thing to do, as 
the view is defined on the environment object which is theoretically not 
visible to components.

Furthermore, it would be better to catch on labels, since several views 
can be plugged on a given label (e.g. "content" & "pretty-content"). And 
it would be impossible for the action to access this information.

> P.S. Sorry to start trouble, but I think someone had to mention it. 

No trouble. Just lots of misunderstandings in this thread, I guess.


Sylvain Wallez                                  Anyware Technologies 
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -

View raw message