cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grzegorz Kossakowski <g...@tuffmail.com>
Subject Re: [C3] Pipeline component event types
Date Tue, 13 Jan 2009 21:23:55 GMT
Steven Dolg pisze:
>> It depends how you define being successful. I've managed to express
>> this rather simple idea but the code is horrible
>> thus I consider it as a failure.
>>   
> Well cleaning up working code cannot be that hard - you haven't invested
> years of time, have you ;-)

The problem is that I see no way how to clean it up at the moment. I have the feeling that
I've just hit limitations of
Java as language and I can't get rid of this feeling. As I said in my original e-mail I would
like to be proven I'm
wrong so any patches are welcome. :-)

You can treat it as a nice exercise on Java generics usage. If you are curious what kind of
limitations I can see here are:
1. I would like to define that PipelineComponent.execute is a method with singature like:
Event|Nothing|Continue ->
Event|Nothing|Continue. By Event|Nothing|Continue I mean case type where you can pass either
Nothing object or Continue
object or an object that extends Event. There are no case types in Java so I had to introduce
interfaces Continue and
Event. But even this does not solve a problem because Nothing and Continue implementations
will not extend specific
event type that component accepts. Actually this is not a problem, as conceptually Nothing
and Continue events are
completely different cases and should be handled differently. The problem is how to express
this idea in Java in concise
way.

2. Have a look at PipelineImpl. There are two subclasses, what an ugliness right? But try
to get rid of them. You would
need something like:

private Pipeline<T, W> pipeline;
private PipelineComponent<W, U> component;

This way we express that components accepts what pipeline produces but W is not defined anywhere.
What we really need is
a tuple so you could say something like:
private <W extends Event> (Pipeline<T,W>, PipelineComponent<W, U>) pipelineAndComponent;

Or something like that. Again, I have no idea how to express this in Java in a _concise_ way.

> That adding a component actually returns a different pipeline is an
> interesting approach but I'm not sure I want to declare a new variable
> for each of the new pipelines.

Yep, that's a valid concern. Actually, this kind of construct is a functional-like and it
at the same time enforces you
to use it differently. I don't want to go into details but the main idea is that handling
of pipeline construction is
handled by various functions and you are just passing around partial pipeline without introducing
any of additional
variables. This is similar to method-chaining (or method combining) in Java with a difference
that in functional
languages function combining is perceived as a basic programming technique. As we are probably
going to stay in Java I
would like to see if this inspires someone else to come up with casual Java counterpart.

> And method chaining is not really me idea of readable code.

Depends on view, but I sort of agree that in most cases it's not readable.

> Also I'm wondering what return type a SAXSerializer would have or what
> event types SAX uses.

We would have to define our own type which simply implements SAX events as simple classes
instead of method calls as
it's done in standard way. I know that it's not the best thing to define our own APIs but
original idea (even if
influenced by performance considerations) of passing events by method calls wasn't that good.
Anyway, we have already
had this kind of discussion when StaX research was discussed.

> Are those event types just for the compiler or are they actually used to
> pass the data around?

They would be used for passing data around. You can see examples implementing reworked interfaces.
For example, if
serializer produces an output stream then it just emits *one* event called OutputStreamEvent.
Or if we want to have
partial results for this kind of serializer it could emit many events that would contain just
fragments of the final
output. If we are at partial results, I remember you have already asked about it in some e-mail.

I would like to explain one nice "side-effect" of my design. I'll show how one functional
concept - expression
evaluation laziness can be easily implemented in pipelines. In order to explain it I'll introduce
my view on pipelines
and pipeline components.

Pipeline component is just a function f: Event|Nothing|Continue -> Event|Nothing|Continue.
Nothing and Continue events
will be explain later. If we have f_1, f_2, ..., f_n, pipeline is just a function composition:
f_n * ... * f_2 * f_1 = f_n(...(f_2(f_1( )))

Now, what makes pipeline different from ordinary function composition is, in my opinion, that
each of functions can emit
partial result based on partial input. Partial result/input is just a sequence of events where
each is different from
Continue event. Full result of function execution is just a sequence of events ended with
Nothing event (which is a
special marking object). The property of returning partial results makes functions (pipeline
components) streamable.
This results in, for example, sending browser fragments of HTML page as soon as they are calculated
without waiting for
finishing processing of all events.

If you are wondering how generator is defined, then it's just a function g: Nothing ->
Event|Nothing. This definition
reflects the fact that generator is a special function that _generates_ events out of nothing
from Pipeline point of
view. It does not base it's output on any incoming events but on some external data source
that is unknown to pipeline
and is out of its focus. If generator emits all its events, it signalizes it with Nothing
so its result is a sequence of
events ended with Nothing.

Now let's discuss Continue. This is a helper object that functions can emit in order to express
the fact that they need
more input events in order to produce any portion of result. Think of transformer that replaces
some fragment of XML
with another fragment of XML based on what has been in original fragment. Therefore it has
to collect all events
repressing original XML fragment in order to produce new events. Here you can recognize that
word "collecting" involves
some buffering but I won't go into details as I want to focus on other aspects and not implementation
details.

Having rather precise definitions before our eyes we can move to laziness property of pipeline
execution. In definition
of function f (pipeline component) is not said precisely when function f can emit Nothing
event. Actually, it wasn't
part of definition but f must satisfy a property that f emits Nothing after finite number
of receiving Nothing events
(it's reader's exercise to find out why). This means that f can emit Nothing as response to
any kind of event.
Let's consider an example:
Pipeline P1: f_1 -> f_2 -> f_3
Pipeline P: P1 -> f_4 -> f5

Now let's assume that f_1 is a generator, generating a large stream of events from a big XML
file or some records. Now
let's assume that f_2 is just a simple function doing some simple transformation like text
formatting. Now f_3 is a
query function that has a query defined like: NumberOfRecord() <= 20. This means that in
pipeline P1 we want to extract
only 20 first records of big file. What f_3 does after consuming 20 records is that it just
returns Nothing event to say
that it's the end of result for f_3.

It means to pipeline execution that after Nothing is received from f_3 the whole P1 pipeline
can be discarded and
execution should continue with f_4 and f_4. It means that the rest of that big XML file wont'
be read.

I won't give you a formal definition of laziness but I'm sure you've got my point. With this
kind of design of pipelines
we get laziness almost for free which is a nice addition after all. Isn't it?

                                                             ---- o0o ----


Ok, this e-mail got rather lengthy but I had a chance to explain to you how I see Cocoon Pipelines
on paper. That was an
occasion for me to introduce to you a concept of lazy evaluation of pipelines. For you it
was an opportunity to see what
have influenced my current view on pipelines design.

Thank you for your attention.

-- 
Best regards,
Grzegorz Kossakowski

Mime
View raw message