cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Fagerstrom <dani...@nada.kth.se>
Subject Re: [RT] Input Pipelines: Storage and Selection (was Re: [RT] Input Pipelines (long))
Date Tue, 14 Jan 2003 23:29:56 GMT
Stefano Mazzocchi wrote:

 > Sorry for taking me so long.

I should be more sorry ;) If I could learn to write shorter RT:s it 
would take less time booth to answer them and to answer the answers, 
anyway, I'm always happy when you, (and other people) give feedback on 
my thoughts.

 > Daniel Fagerstrom wrote:

<snip/>

 > Once the Cocoon Environment is more balanced toward input, you can
 > have a uber-payload-generator that does everything and brews beer, or
 > you can have your own small personal generator that does what you want.

Agreed.

 >>   <generate type="xml"/>
 >>
 >> The idea is that if no src attribute is given the sitemap interpreter
 >> automatically connect the generator to the input stream of the
 >> environment (the input stream from the http request in the servlet
 >> case, in other cases it is more unclear). This behavior was inspired
 >> by the handling of std input in unix pipelines.
 >
 >
 > Hmmm, interesting concept indeed, but I wonder if it's really
 > meaninful in our context. I mean, maybe there are generators that
 > don't need src and don't rely on input. But an idiotic TimeGenerator
 > is the only one I can think of... and that really doesn't stand up as
 > an argument, does it?

The TimeGenerator could just ignore the input stream, as many unix
programs ignores std input. But if the input stream is a multipart mime,
it is unclear if we should feed the whole multipart mime or just a part
of it (this has been discussed in the "StreamGenerator depends on
Servlets!!!" thread and I stated my current opinion on the subject in
the "[RT] Better Environment Abstraction" thread). So IMO the idea
doesn't stand the test against reality, we need something more explicit.

 > Nicola Ken proposed:
 >
 >>
 >>   <generate type="xml" src="inputstream://"/>
 >>
 >> I prefer this solution compared to mine as it doesn't require any
 >> change of the sitemap interpreter, I also believe that it it easier
 >> to understand as it is more explicit. It also (as Nicola Ken has
 >> explained) gives a good SoC, the uri in the src attribute describes
 >> where to read the resource from, e.g. input stream, file, cvs, http,
 >> ftp, etc and the generator is responsible for how to parse the
 >> resource. If we develop a input stream protocol, all the work
 >> invested in the existing generators, can immediately be reused in web
 >> services.
 >
 >
 > It is true that reduces the number of required generators. But there
 > is something about this that disturbs me even if I can't really tell
 > you what it is rationally... hmmm...

Ok, we discuss this later, or hope that the disturbing feeling just
fades away, and disapears completely ;)

<snip>Interesting discussion about validation, scheemes etc</skip>

 > > Should validation be part of the generator or a transform step? I
 > don't know.
 > Transformation, for the simple reason that you might need to validate
 > a pipeline more than once.

I'm totally convinced :)

<snip>More interesting discussion about validation</snip>

 >> Mixing data and state information was considered to be a bad practice
 >> in  the discussion about pipe-aware selection (se references in [3]),
 >> that rules out using only augmentation of the xml document as error
 >> reporting mechanism. Throwing an exeption would AFAIU lead to
 >> difficulties in giving customized error reports. So I believe it
 >> would be best to put some kind of state describing object in the
 >> environment and possibly combine this whith augmentation of the xml
 >> document.
 >
 >
 > Yes, that would be my assumption too. And in case there is the need to
 > incorporate those validation mistakes back into the content, a
 > transformer (maybe even an XSLT stylesheet) can do that.
 >
 > This seems the cleanest solution to me.

Good.

<snip/>

 > we are reaching the point where pipeline selection cannot be
 > processed "a-priori" but must include
 > information on the run-time environment.

 > As much as I didn't like pipe-aware selection, I do agree that
 > validation-aware selection is a special pipe-aware selection but it
 > *IS* very important and must be taken in to consideration.
 >
 > Hmmm, this kinda shades a totally different light on the concept of
 > selection. (which has an interesting side effect in making selectors
 > and matchers even more different than they are today).
 >
 >> An alternative and more explicit way to describe the pipeline state
 >> dependent selection above, is:
 >>
 >> ...
 >>   <transform type="validator">
 >>     <parameter name="scheme" value="myInputFormat.scm"/>
 >>   </transform>
 >>   <serialize type="object-model-dom" non-terminating="true">
 >>     <parameter name="name" value="validated-input"/>
 >>   </serialize>
 >>   <select type="pipeline-state">
 >>     <when test="valid">
 >>       <generate type="object-model-dom">
 >>         <parameter name="name" value="validated-input"/>
 >>       </generate>
 >>       <transform type="xsl" src="myInputFormat2MyStorageFormat.xsl"/>
 >> ...
 >>
 >> Here the extensions to the current Cocoon semantics is put in the
 >> serializer instead of the selector. The sitemap interpreter treats a
 >> non-terminating serializer as ordinary serializer in the sense that
 >> it puts the serializer in the end of the current pipeline and
 >> executes it. The difference is that it instead of returning to the
 >> caller of the sitemap interpreter, it creates a new current pipeline
 >> and continue to interpret the component after the serializer, in this
 >> case a selector. The sitemap interpreter will also ignore the output
 >> stream of the serializer, the serializer is suposed to have side
 >> effects. The new current pipeline will then get a
 >> ObjectModelDOMGenerator as generator and an XSLTTransformer as its
 >> first transformer.
 >
 >
 > No, I'm sorry but I don't like this. I totally don't like the abuse of
 > serialiers for this concept of 'intermetiate-non-sax-stream'
 > components. It's potentially very dangerous, I see an incredible
 > potential for abuse.
 >
 > What do others think about this concept of pipelining pipelines? isn't
 > this kind of recursion the mark of FS?
 >
 >> I prefer this construction compared to the more implicit one because
 >> it is more obvious what it does and also as it gives more freedom
 >> about how to store the user input.
 >
 >
 > True, but it also gives people more ability to abuse the system. Think
 > about internal pipelines, and views, and resources and aggregation...
 > have you thought about all the potential uses of these pipeline
 > pipelining on all current sitemap usecases?

  From an implementation POV, it is a small extension of what I
implemented for pipe aware selection. During that work I thought a lot
about the issues that you list, and also did some basic testing of the
implemetation. My impression this far is that it seem to work, but most
certaninly there are many subtile issues that I might have missed.

 > you are, in fact, proposing a *MAJOR* change in the way the pipelines
 > are setup. In short, more freedom and less pipeline granularity... but
 > sometimes it's good to make it harder for them to come up with
 > something... so they *THINK* about it.

Whether we choose to solve the need for pipe state dependent selection,
by extending the semantics of selectors, introducing "pipeline
pipelining" or even leave the sitemap as is and introduce the possiblity
for flowscripts to use one pipeline for handling input and another one
for generating output, we will need to think more about pipeline setup
and semantics. Besides AFAIU it is not a major change but a small
extension, at least from implementation POV, but of course, we have to
be very carefull about the consequences.

 > Maybe I'm being too conservative, but I'm very afraid of all those
 > unplanned (and unwanted) changes that these new chained pipelines
 > could produce...

We solve this as always: by the distributed thinking on the developer
list, by fleshing out a proposal, by implementing the proposal and let
it live in the scratchpad until the concepts are considered to be mature
enough.

 > (besides, how do you stop them from wanting more than two pipelines?
 > should we?

No, if we introduce pipeline pipelining I see no reason for limiting it
to just two steps.

 > would you also like to chain a pipeline with a reader and then another
 > pipeline?)

Yes, that seem to be a natural consequence of the proposal.

 >
 >> Some people seem to prefer to store user input in Java beans, in some
 >> applications session parameters might be a better place then the
 >> object model.
 >
 >
 > I've seen the ugliest sitemaps coming out of exactly that concept of
 > storing everything in the sitemap and then parsing it back into the
 > pipeline... believe me, it's more abused than used correctly as it is
 > right now.

I think that the concept of transformers with side effects
(SQLTransformer etc) migth be part of the reason for ugliness in
sitemaps (I don't know what kind of ugliness you refer to, so I might be
completly of track). A transformer with side effects makes it possible
to store data, fetch new data, and pass throw data, all in the same
step, this easily leads to mix of concern and unclean solutions. The
same goes IMO for storing data in generators or in actions. This is part
of the motivation for my proposal about "pipelining pipelines", IMO
ideally, a generator is the correct place for fetching  binary data, a
transformer should just transform data and have no side effects, and a
serializer should write binary data (either to the response object or to
files, dbs, the environment etc). Ok, this might be to puristic and to
far away from current practice to be a realistic position, but I do
believe that mixing of store, fetch and transform operations in the same
pipeline step, easily lead to unclean solutions.

 >
 >> Pipelines with Side Effects
 >> ---------------------------
 >>
 >> A common pattern in pipelines that handle input (at least in the
 >> application that I write) is that the first half of the pipeline
 >> takes care of the input and ends with a transformer that stores the
 >> input. The transformer can be e.g. the SQLTransformer (with insert or
 >> update statements), the WriteDOMSessionTransformer, the
 >> SourceWritingTransformer. These transformers has side effects, they
 >> store something, and returns an xml document that tells if it
 >> succeeded or not. A conclusion from the threads about pipe aware
 >> selection was that sending meta data, like if the operation succeeded
 >> or not, in the pipeline is a bad practice and especially that we
 >> don't should allow selection based on such content. Given that these
 >> transformers basically translate xml input to a binary format and
 >> generates an xml output that we are supposed to ignore, it would IMO
 >> be more natural to see them as some kind of serializer.
 >>
 >> The next half of the pipeline creates the response, here it is less
 >> obvious what transformer to use. I normally use an XSLTTransformer
 >> and typically ignore its input stream and only create an xml document
 >> that is rendered into e.g. html in a sub sequent transformer.
 >>
 >> I think that it would be more natural to replace the pattern:
 >>
 >>   ...
 >>   <transform type="store something, return state info"/>
 >>   <transform type="create a response document, ignore input"/>
 >>   ...
 >>
 >> with
 >>
 >>   ...
 >>   <serialize type="store something, put state info in the environment"
 >>              non-terminating="true"/>
 >>   <generate type="create a response document" src="response document"/>
 >>   ...
 >>
 >> If we give the serializer a destination attribute as well, all the
 >> existing serializers could be used for storing input in files etc as
 >> well.
 >>
 >>   ...
 >>   <serialize type="xml" dest="xmldb://..." non-terminating="true"/>
 >
 >
 > Now, let me ask you something: how much have you been playing with the
 > FlowScript?

I did some experiments with integrating xmlform in flowscripts half a
year ago, but found it very hard to debug flowscripts, so I have not
introduced it in any comersial projects that I work on yet, (my boss
seem to be afraid for bleeding edge technology sometimes). So, yes, my
thinking might clearly be biased, and it would be very interesting to
hear what people with more practical flowscript experience have to say
in these matters.

 > A while ago I proposed the ability to call a pipeline from the
 > flowscript but specifying the outputstream that the serializer should
 > use. Basically, the flow now can use a pipeline as a tool to do stuff
 > without necessarely be tied to the client.

Seem like a very good idea. Now I wonder, is error handling for an form
page (that is not part of a multipage wizard), something that should be
handled in flowscripts, or should it be handled in the sitemap? If it
should be handled in the sitemap it is an argument for "pipelining
piplines" and a destination attribute in transformers, as it otherwize
would mean that you could use redirection of serializer output, if your
page is part of a flow but not otherwise.

 > In all your discussion you have been placing a bunch of flow logic
 > (how to move from one pipeline to the next) into the sitemap. I'd
 > suggest to move it where it belongs (the flow) and let the sitemap do
 > its job (defining pipelines that others can use).
 >
 > Why? well, while the concept of stateless output is inherently
 > declerative, the concept of stateless input + output is declarative
 > for the match and procedural for its internals.
 >
 > So, I wonder, why don't we leave the declarative part to the sitemaps
 > and use the flow as our procedural glue?

That might be a good way to do it, but in that case we need good design 
patterns and suport for how to handle the storing of input in flow 
scripts, any ideas? I'm definitely not attracted of the scenario that we 
replace the (anti) pattern of: you can do anything you want with the 
input by writing java code in xsp or in actions, with the (anti) pattern 
of: you can do anything you want with the input in javascript in your 
flowscripts. I'm not saying that you ever have propsed any of the above 
anti patterns, rather the opposite, but I know that many programmers 
seem to find them effortlessly as soon as we don't give good examples 
and support for a better practice.

Today, one way of storing input is to use a transformer with side 
effects, I have used such a solution in several projects and IMO a 
rather clean SoC between retrieving transforming and storing is at least 
possible. As I have argued for above it would IMHO even cleaner to 
slightly extend the semantics of serializers, and let serializers be the 
components that store data, and in general let transformers be side 
effect free, however this is no big deal. I need to see some examples of 
storing of input in flowscripts to be see if that is a better place for 
such things.

 >>   ...
 >>
 >> This would give the same SoC that i argued in favour of in the
 >> context of input: The serializer is responsible for how to serialize
 >> from xml to the binary data format and the destination is responsible
 >> for where to store the data.
 >
 >
 > This can be achieved with a flow method that includes a way to
 > specific the output stream (or a WriteableSource, probably better)
 > that the serializer has to use.

I'm still not convinced why it should be good idea to redirect the 
output from a serializer to a writable source in the flowscript, but a 
bad idea to do the same in the sitemap.

 >> Conclusion
 >> ----------
 >>
 >> I am afraid that I put more question than I answer in this RT. Many
 >> of them are of "best practice" character, and do not have any
 >> architectural consequences, and does not have to be answered right
 >> now. There are however some questions that need an answer:
 >>
 >> How should pipeline components, like the validation transformer,
 >> report state information? Placing some kind of state object in the
 >> object model would be one possibility, but I don't know.
 >
 >
 > The real problem is not where to store the data, IMO, but the fact
 > that you showed that there is a serious need for run-time selection
 > that can't be addressed with our today's architecture.

Agree, let us focus on that part and try to decide which of the two 
proposed ideas we should go for, or finding a better one. AFAIU, booth 
proposals are rather straight forward to use in Cocoon and would give 
minor or no back incompability problems.

<snip/>

Thank you for taking your time and commenting the RT.

/Daniel Fagerstrom



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message