cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: [RT] Webdavapps with Cocoon
Date Sun, 27 Jul 2003 09:38:24 GMT
replying to both Gianugo and Marc in the same email for brevity.

On Friday, Jul 25, 2003, at 17:08 Europe/Rome, Marc Portier wrote:

> Gianugo Rabellino wrote:
>> Stefano Mazzocchi wrote:
>
> <snip />
>
>> Now Cocoon, in its present incarnation, is heavily biased by the 
>> "read-only" syndrome, and this makes IMO very hard to enter the 
>> WebDAV world. I see two serious areas where WebDAV support needs 
>> careful (re)thinking of core Cocoon patterns:
>
> I think this applies also to more classic file-upload schemes?

Yes, it totally does. the way file-upload is handled today is just one 
aspect of a more general 'polishing outside-in flow of information' for 
cocoon.

(note I used the term "polishing" not "rethinking", see below why)

>> 1) URI space decoupling being unreversable: while this is a *major* 
>> feature of Cocoon (and something that might help immensely when 
>> applied to a DAV environment: views on WebDAV would really kick ass, 
>> imagine presenting your XML files as virtual OpenOffice .sxw that are 
>> assembled /disassembled on the fly), the drawback is that, in most 
>> cases, it's impossible to work your way from the pipeline result to 
>> the single pieces that make that result happen. Even the simplest 
>> XSLT transformation can't be reversely applied, so now there is no 
>> way to understand how an resource should be treated in a symmetric 
>> way when requested or uploaded. Oh yes, you can
>
> hm, do we really need to look at it as symmetric?

No, we don't. I've been thinking about this a lot and I think that 
symmetry is not only a holy grail, but it's the wrong grail to consider 
holy. Read on.

> I know we are tempted to do so, but is it a must?

It is tempting, but symmetry-driven design is bad. we must understand 
what we want, why and what is limiting us.

> Is it imposed by current webdav enabled editors?

It has been already said that webdav is the most under-hyped technology 
ever.

Microsoft said in the helloween documents that they pushed for webdav 
to be a supercomplex specification so that opensource wouldn't be able 
to implement it. Greg Stein (the current ASF chairmain, BTW) finished 
mod_dav in a few days disturbed by those documents (if you ever meet 
Greg, as him, is a pretty funny story and he's very proud of having 
done that [he worked for microsoft before])

As a result of this, Microsoft moved away from webdav (probably they 
thought it was not complex enough) and into web services (will the 
SOAP/WSDL/UDDI/BPEL4WS stack will be hard enough for OSS to catch up? 
hopefully we'll be smarter and just keep going with good old HTTP style 
WS).

As a result, webdav was (more or less) abandoned by the market. 
Subversion is the only use of webdav that goes behind saving a file on 
disk thru your web folder (which implementation sucks ass and I bet is 
not going to be better in the future, in favor of a SOAP-based document 
upload web service). Again, Greg Stein is behind the effort.

WebDAV is a very generic protocol (just like HTTP is) but people are 
influenced by implementations more than by the protocol design 
themselves. For example, almost everybody on the web believes that

  http://blah.com

and

  http://blah.com/

are the same URL just because all web clients will call

  HTTP/1.0 GET /

on both requests. But they don't know that

  http://blah.com/news

and

  http://blah.com/news/

are two different URL and it's the web server that (normally! but 
nobody ever specified this behavior anywhere!) translates the first 
into the second if the folder 'news' if found in the file system that 
mounts to that URL space.

Note that on a real FS, everybody knows the difference between

  /home/blah/news

and

  /home/blah/news/

because the OS enforces type checking on these (on a POSIX file system 
you cannot open a directory for writing as a file, for example).

The above weakness of URL space handling is the first thing that 
severely hurt the WebDAV world. [note: a bug in microsoft web folders 
eliminates the trailing slash from URL before sending the HTTP request, 
go figure! means that nobody in microsoft ever thought about 
webdav-editing the root of a folder (which is normally its index, or 
default content in ISS terms)]

Some say (ever Marc suggests) that the forcing of DAV to work all the 
actions on the same URL might be a reason for poor success, but I 
disagree because it doesn't take resource views into consideration.

If I have a resource like

  http://blah.com/news/

and I want to edit, I could ask for

  http://blah.com:8888/news/
  http://edit.blah.com/news/
  http://blah.com/news/?view="edit"

which are all 'orthogonal' ways of asking a different view of the same 
resource accessing it thru a parallel URL space (but with different 
behaviors)

I normally prefer the virtual-host approach. something like this

    [frontend] <- [repository] <- [backend]
  http://blah.com             http://edit.blah.com

where frontend and backend are separated (frontend might even be a 
static representation of the saved content (say, created by a cronned 
forrest every hour or so).

The above setup removes you from the need from having to be symmetric.

> (they want to put back where they got I assume?)
>
> actually if you look at the combination of 
> matchers/request-method-selector you wrote up it more looks like the 
> request-method being part of the uri-request space almost?

I dislike this. the action should not be encoded in the URI space.

>
> or put differently each request-method caters for a separate uri 
> space?  taking from there the symmetry between those spaces is 
> something you can or cannot want to achieve?
>
> (we're not used to look at this in this way, and I might be totally 
> off scale here)

I would tend to prefer to have a backend with the exact same URL space 
than the front end, just providing different "views" on the data from 
the frontend from all the potential HTTP requests.

After years of tries and thinking, I believe the above is the best way 
of doing it.

>> <match pattern="*.xls">
>>   <select type="request-method">
>>     <when test="GET">
>>        <generate src="{1}.xml"/>
>>        <transform src="xml2poi.xls"/>
>>        <serialize type="hssf"/>
>>     </when>
>>     <when test="PUT">
>>        <generate type="xls2poi"/>
>>        <transform src="poi2sourcewrite"/>
>>        <transform type="sourcewrite"/>
>>        <serialize type="dummyserializer"/>
>>     </when>
>>     [...]
>> </match>
>> but this, apart from being ankward, doesn't work in general for all 
>> pipelines: think about aggregation at a very least.

Some high-end CMS (the good ones, not that stinking hyperexpensive 
vignette crap) implement the concept of wevdav de-aggregators. But, 
IMHO, the complexity of implementation and configuration of those 
resources makes their use totally ackward.

IMO, for aggregation, one potential solution is to provide a 
sub-URL-space that is directly accessible from the backend (interesting 
enough, this is the same concept that ReiserFS4 applied to pseudo-files)

Example, if on the frontend you have

  /page

which is an aggregated resource with parts "top" "navbar" "body" the 
backend might do

  /page -> PUT/POST forbidden
  /page/top
  /page/navbar
  /page/body

but note that this is *NOT* something that cocoon should decide 
automatically, but it's something that *you* should decide in your 
backend sitemap for your webdav application. because another way of 
doing the above is simply

  /page

where GET goes thru aggregation identifying the non-editable parts with 
special IDs, then PUT goes thru a stylesheet that filters out the 
non-editable elements. This is poor man de-aggregation but works and 
you decide your own.

My point is: symmetry is a holy grail, we should just come up with 
components and best practices to show people how to do stuff and they 
will build their own webdavapp.

The hard part is to let them know that webdav is nothing more than a 
few other actions on top of HTTP.

> isn't this aggregate example just showing that some GET-URI's are to 
> be considered as read-only? (not to be abused for a PUT that is)

In many situations, your webdavapp will forbid some actions on some 
resources, but this is very natural.

> couldn't dav properties (PROPFIND?) provide such meta-data per GET-URI?
> is any usage of those properties in any way standardised?

very few dav properties are standardized. since we don't control the 
client side, we cannot make assumptions on these.

>> 2) direction: Cocoon is clearly designed for an "inside-out" type of 
>> flow in mind, while WebDAV is fully bidirectional.

this is not true anymore. with the ability to have pipeline dump their 
content on an outputstream if called from the flow, cocoon reached 
complete bydirectionality.

>> Design-wise it's difficult to adapt the G-T-S pattern to an incoming 
>> stream of data,

I can't see why. Admittedly, there are generators who are hardly 
reusable in both in-out and out-in case (StreamGenerator or 
RequestGenerator, for example) but that is not a deficiency of the 
pipeline design, expecially now that the output stream of the pipeline 
is reconnectable.

>> when you're barely generating stuff (you're actually deserializing 
>> it) and, mostly, when you're not serializing anything but a simple 
>> response (think MKCOL, MOVE, DELETE and the like).
>
> this stuff sounds like flow integration on a separate section of the 
> uri-request-space?

I totally agree. i think it would be fairly easy to implement a full 
dav stack with flowscript and a few java components that wrap around a 
repository (could be as simple as a file system)

>> This said, I have no real solutions to that, but I'm very curious to 
>> learn more about your "extractor" concept. I think this is something 
>> needed, yes, but would that be enough?

Yes, i totally think so. once you are able to extract information from 
the pipeline that you need to process it, the sitemap+flow can do 
whatever you need, in a fully symmetrical way (if you wish to do so).

>>> webdav has been thought as a protocol for saving and retrieving 
>>> files, but this is, again, another file-system injected syndrome of 
>>> mod_dav. It
>> Though this makes it a tremendous tool too! The problem is that right 
>> now all the WebDAV implementations are "dumb" filesystems, where all 
>> you get is persistent storage. What I would love to see (and Cocoon 
>> would fit just perfectly) is the ability to build around the file 
>> system metaphore a whole set of components being able to react on the 
>> "filesystem" operation. In this case, a "save" (or "drag 'n drop") 
>> might
>
> see this makes me return to the uri-binding again...
> if we were to do this without webdav and only with POST and 
> file-upload stuff then the uri would be holding the 'action' that 
> webdav carries in his method

yes, it would be possible. but the good thing about dav is that many 
fat clients implement it (office, openoffice, photoshop) providing a 
super-easy way for people to interact with something that can be seen 
as a repository (and maybe, on the other hand, is just a cocoon 
wrapping a file system and a relational database, depending on the URL 
presented)

>> mean an email sent to an administrator, or a workflow procedure being 
>> started: as easy as that, no client needed, just what we already got, 
>> networked shares and (maybe) a web browser: who needs a CMS client 
>> anymore then? Probably only CMS administrator, not users. Or (again) 
>> think about views/facets: being able to glue the Cocoon power to 
>> WebDAV might mean giving different content to each user. Graphics 
>> might see only images, and only in hi-rez: Cocoon will take care of 
>> making scaled down versions, while hiding them from the users. 
>> Possibilities are endless.
>
> mmm, dreaming allowed...
> MOVE of a product.xml-file to another productline-collection results 
> in a sql update on the foreign-key relation ?

why not. we could be as wild as doing an SVG report graph of a 
relational table, modify it with illustrator, save it and alter the 
data in the table. How about that? ;-)

>>> I would love to see cocoon becoming a framework that can glue 
>>> together everything on the web, from stateless publishing from 
>>> stateful webdav applications and yeah, why not, once you can do 
>>> webdav applications you can do SOAP applications or XML-RPC 
>>> applications, they are, more or less, all XMLoverHTTP stuff.
>> Oh, me too, believe me! This might be the Next Big Thing (hey... 
>> wait, are we ready to be 10 years ahead of the crowd? ;-)).
>> Now for the big question: should we leave this discussion for now, 
>> focusing on the upcoming release and take webdavification as one of 
>> the major challenges for the next generation (this alone might be a 
>> good reason for Cocoon 3.0 IMHO), or shoud we have some more fun on 
>> the topic here and now?

I think we should get this release out of the door ASAP, then start 
thinking about what's next.

I just wanted to tell you that there is a lot of thinking to do about 
webdav but we are in pretty good shape with what we have.

> hehe, the avalanche has already started :-)
> managing the change into timing/planning and releases is a different 
> aspect, they can (and should) run in parallel IMHO
>
> the bigger challenge of being 10 years ahead is that these fast, wild, 
> non-domesticated, associated thoughts here and now aren't mature 
> enough to pull of anything and the discussion dries up before it 
> started... we shouldn't add a management constraint onto that IMHO

yes, but we shouldn't put too many irons in the fire either.

--
Stefano.


Mime
View raw message