cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Conal Tuohy" <con...@paradise.net.nz>
Subject webmail
Date Thu, 01 Aug 2002 23:53:26 GMT
Hi Justin

Just a note to keep you up to date on where I'm at, and maybe provoke some
response:

I've been reading more and more, and thinking and drawing architectural
diagrams ... it's more complicated than I first thought. I think this is an
issue for your project too.

Currently I'm still thinking that the JavaMail access should be a Source,
not implementing XMLizable, i.e. it would provide messages in the
"message/rfc822" format. So that hasn't changed.

The parsing to XML should probably be done by either an XMLizer, or a
Generator. Or possibly, another Source, layered on top of the MIME source.
This is the issue that's exercising me at the moment. The fact is that a
MIME message is not a simple data type: it is in fact a kind of file-system,
with other files and directories in it (the MIME-parts).

The big thing which I'd been neglecting was how to provide access to these
MIME-parts (i.e. a PART of a message) to the relevant components of Cocoon.
For instance, a MIME message may contain a part which is in "text/html"
format, and this html may refer (with an IMG tag) to an image which is in
another part of the MIME message, with an "image/gif" mime-type for
instance. To render this doc to a web-browser as html, or as PDF or
whatever, it will be necessary for the Cocoon pipeline to extract this gif
image from within the message, and feed it to the browser or Batik, as
required. Concretely, the web app will have to generate a web page
containing an IMG with href = some url which Cocoon can then use to find the
gif image from inside that particular message.

Much the same applies to MIME-parts which contain message attachments of
some arbitrary mime-type ("application/octet-stream" is a good one) which
Cocoon can't do anything useful with, but which a browser might understand,
or at least download as files.

I have to do this because my list archive needs to handle attachments.

It seems to me I've got 2 main options:

1) XMLize everything

Currently I'm tending towards using an XMLizer which will convert a
"message/rfc822" byte-stream into SAX events (possibly using the XMSG schema
rather than the XMTP schema I've used before: http://www.w3.org/TR/xmsg/ -
I'm not sure about this yet)

This XMLizer would handle all the MIME-parts (even non-xml parts would be
returned as "lumps" of data) and these could therefore be handled using the
various XML-processing mechanisms: Xinclude, XSLT, etc, etc, even without
necessarily being able to process their actual contents. So a GIF image
MIME-part would appear as a <data> element in the SAX stream:
http://www.w3.org/TR/xmsg/#N632 containing some text-encoded GIF data (i.e.
Base64 encoded). For a binary mime-part, Cocoon processing would be limited
to kind of "routing" it through to the browser, without transforming it on
the way. To use this technique, we'd also need a MIMEPartSerializer which
would decode this part into a binary stream, for return to a browser.

Of course, MIME-parts of XML would be parsed fully, and mime-parts of HTML
would be converted to XHTML with JTidy.

Using this approach, to refer to a MIME-part in the sitemap, you would
generate the full message, then extract the part using a transformer for
instance. There'd be no need to encode everything into the source url used
in the sitemap, and this keeps the Source simple (means we can use a
FileSource to read emails from individual files, too).

2) Handle non-XML parts in their native format

I'm not so clear on how this one would work, but I haven't yet ruled it out
entirely ... I still need to get it clear in my head.

We'd need some component that would return a MIME-part from within a
message, in a native format. It seems to me that it will need to implement
Source (as far as Cocoon is concerned, this is the interface for reading a
non-XML resource). But it must be able to get the MIME-part from either a
file or url or from some kind of JavaMail source. So it would be a Source
layered on top of some other source (AFAIK this would be a unique pattern in
Cocoon, but not unreasonable given the nature of MIME-messages).

I'm not sure if this is really a runner: the MIME-parts contain more data
than a Source could provide. For instance, a Content-Disposition. This is
one reason why I'm inclining towards the XMLizer approach. The trouble is
that Cocoon is set up for doing magic with XML, but non-XML data is either
converted to XML or else just passed through with a Reader. There's no
facility for "pipelines" of non-XML data.


----------------------------------------------------------------------------
----
Whew! I've got another busy day today for another client and may not get
anything done on it, but over the weekend I'll spend some more time on it
and hopefully begin some actual programming work.

I've also been trying to define a URL scheme for referring to JavaMail
resources. This also relates to the "cid:" and "mid:" schemes which are used
for hyperlinks within a given MIME message, though as I said, I'd prefer to
leave this out of the javamail or pop or whatever URLs and deal with it
inside the Cocoon pipeline.

It also relates to how to represent the contents of a JavaMail FOLDER to
Cocoon: whether directly as XML or with mime-type "Multipart/report"
http://www.ohse.de/uwe/rfc/rfc1892.html or "Multipart/digest"
http://deesse.univ-lemans.fr:8003/Connected/RFC/1521/19.html which can then
be XMLized with a Cocoon XMLizer. See
http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html for details of
multipart formats.

Anyway ... I'm off now to have some lunch and then I have to visit a client.
I hope your work is going ok.

Con

Mime
View raw message