Return-Path: Delivered-To: apmail-forrest-dev-archive@www.apache.org Received: (qmail 66266 invoked from network); 15 Aug 2006 14:22:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 15 Aug 2006 14:22:42 -0000 Received: (qmail 76138 invoked by uid 500); 15 Aug 2006 14:22:41 -0000 Delivered-To: apmail-forrest-dev-archive@forrest.apache.org Received: (qmail 76086 invoked by uid 500); 15 Aug 2006 14:22:41 -0000 Mailing-List: contact dev-help@forrest.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: dev@forrest.apache.org List-Id: Delivered-To: mailing list dev@forrest.apache.org Received: (qmail 76070 invoked by uid 99); 15 Aug 2006 14:22:41 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Aug 2006 07:22:41 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of williamstw@gmail.com designates 64.233.182.190 as permitted sender) Received: from [64.233.182.190] (HELO nf-out-0910.google.com) (64.233.182.190) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Aug 2006 07:22:39 -0700 Received: by nf-out-0910.google.com with SMTP id x4so324555nfb for ; Tue, 15 Aug 2006 07:22:18 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=Cw/APC3lTQjOpy5NK2occu07Cj7EAbrnQMOtUP0riJTERtM5TdhqcoGZkpKN91vYmXo7zN3jWJbCcGeLgF9rqQqFIPBECGuQpcSSnQTVdppZldoFopqdXHbWuo15UukZJ7cx/78jAar5M8crYdxF4TCYVJs2T2gCjRGR3HA/O2I= Received: by 10.49.29.3 with SMTP id g3mr1478037nfj; Tue, 15 Aug 2006 07:22:18 -0700 (PDT) Received: by 10.78.193.10 with HTTP; Tue, 15 Aug 2006 07:22:17 -0700 (PDT) Message-ID: <499888440608150722v3a373a57ic50b41d6714f582e@mail.gmail.com> Date: Tue, 15 Aug 2006 10:22:18 -0400 From: "Tim Williams" To: dev@forrest.apache.org Subject: Re: [RT] A new Forrest implementation? In-Reply-To: <44E1A613.3000309@apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <44E0D61D.40306@apache.org> <499888440608142010y31603d50y66f556301e0c172e@mail.gmail.com> <44E1A613.3000309@apache.org> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On 8/15/06, Ross Gardler wrote: > Tim Williams wrote: > > On 8/14/06, Ross Gardler wrote: > > > >> This is a Random Thought. The ideas contained within are not fully > >> developed and are bound to have lots of holes. The idea is to promote > >> healthy discussion, so please, everyone, dive in and discuss. > > ... > > > I think the Cocoon community has recognized the monolithic-ness of the > > framework. Stefano brought it up[1] and I think the responses are > > encouraging - though the maven promises leave *very* much to be > > desired as it has effectively stopped me from even attempting to build > > their trunk. > > It has been discussed a great many times. Some progress has been made, > but I very much doubt it will happen in a time frame sufficient to help > Forrest. The thread you link to is certainly not the first that > highlighted this issue. > > >> What Forrest Does > >> ================= > >> > >> Input -> Input Processing -> Internal Format -> Output Processing -> > >> Output Format > >> > >> To do this we need to: > >> > >> - locate the source document > >> - determine the format of the input document > >> - decide which input plugin to use > >> - generate the internal format using the input plugin > >> - decide what output plugin we need > >> - generate the output format using the output plugin > >> > >> Lets look at each of these in turn > > > > > > Oversimplified but we'll see where you go with this... > > Please expand. Please add in the complexities that you see so that we > can examine them. > > >> Locate the source document > >> -------------------------- > >> > >> To do this we use the locationmap, this is Forrest technology. > > > > > > A lot of avalon and excalibur + a very little Cocoon for context and > > an (all things considered) wrapped up by a very little bit of Forrest > > code. I'm just suggesting that we've done nothing but wrapped some > > stuff here - "forrest technology" is a stretch. To recreate it, we > > could get context elsewhere but we'd need an equivalent to > > avalon/excalibur I think. > > Come on, are you realy claiming that we need Avalon+Excalibur+Cocoon to > create a hashmap of possible matches to any given string? I'm saying the matching/selection does not come from Forrest code. They would need to be implemented. Source resolution/validity does not come from Forrest code; it would need to be implemented. > All we need is pattern matching followed by a lookup then a lookup. See > my psuedo code later in the original post. The *concept* of the > Locationmap is Forrest technology and it can be reproduced without any > of the baggage Cocoon requires us to bring along. > > >> Decide which input plugin to use > >> --------------------------------- > >> > >> This is done by resolving the processing request via the Cocoon sitemap. > >> But why? > >> > >> Each input type should only be processed by a single input plugin, there > >> should be no need for complex pipeline semantics to discover which > >> plugin to apply to a document, all we should need to do is look up the > >> type of document in a plugins table. > > > > > > And aggregates? The end result isn't a from a single document but an > > aggregate of multiple data uri's - at least that's the dispatcher > > plan as I understand it. > > All aggregates are about requesting multiple input sources and merging > them together. Therefore aggregates do not belong here, they belong in > the output plugin stage (so I'll come back to this later) > > A cocoon transformer levies pretty > > minimal requirement: an XMLConsumer/XMLProducer (easy and natural, sax > > event handlers and a single method respectively) and some simple > > lifecycle contract methods needed for being a part of the managed > > environment. > > I really should have been talking about the complexitites of writing a > generator. As we very rarely need to write transformers. Try writing a > generator that, for example, uses hibernate to communicate with a > relational database. Same thing, except it's just a producer and not also a consumer. The code to do this will be almost exactly the same in any other SAX-event-streaming approach. But anyway... public class HibernateGenerator extends AbstractGenerator { public void generate() throws SAXException { contentHandler.startDocument(); contentHandler.startElement("","committers", "committers"); List committers = listCommitters(); for(int i = 0; i < committers.size(); i++) { Person indCommitter = (Person)committers.get(i); contentHandler.startElement("","committer","committer"); contentHandler.startElement("","name","name"); contentHandler.characters(indCommitter.getName().toCharArray()); contentHandler.endElement("","name","name"); contentHandler.endElement("","committer","committer"); } contentHandler.endElement("","committers","committers"); contentHandler.endDocument(); } private List listCommitters() { Session session = HibernateUtil.getSessionFactory().getCurrentSession(); session.beginTransaction(); List result = session.createQuery("from Committers").list(); session.getTransaction.commit(); return result; } } No comments on code quality here;) I guess the point here is that you can come up with a complex "generator" requirement, but you've already admitted that SAX event-streaming is the way to go. If this is true, then the complexity of turning some source content into SAX events will ultimately remain. [Note: I've got no experience with Hibernate so this example is strictly based on their docs.] > > I think being in some sort of managed environment (e.g. > > Spring) is likely needed in any real-world approach. So I'd turn this > > around and ask where is the complexity? > > First complexity: building Cocoon > > Second complexity: building any component that has additional dependencies > > Third complexity: deploying a new (non-trivial) component within a plugin > > Fourth complexity: a community that is pulling in many different directions > > There are many more but I will leave it at that. If you don't agree then > I suggest you actually try it before arguing the case. You can then tell > me where I am going wrong. "Actually try" what? Surely you can be more constructive than questioning my credibility here? I've built Cocoon before. I am unable to do so now after the Mavenization. I've expressed that frustration here and on the Cocoon list. Building Cocoon is complex, I agree. Inside the TreeProcessor code is complex I agree. The standard components (Generator, Transformer, etc.) is not that complex. What is it that you'd like me to "actually try" and I'll respond. > Of course, it can be argued that 1-3 are because Forrest was built > against a much older version of Cocoon and has failed to keep up (for > example why a plugins not Cocoon blocks?). I would respond that this is > because of the fourth complexity. > > So, then it can be argued that we should be contributing to Cocoon and > helping resolve the fourth complexity. That may be the outcome of this > RT, it may not. sounds reasonable. > >> Decide what output plugin to use > >> -------------------------------- > >> > >> This is done by examining the requested URL. The actual selection of the > >> output plugin is done within the Cocoon sitemap. I have all the same > >> arguments here as I do for input plugins, this only needs to be a simple > >> lookup, not a complex pipeline operation. > > > > > > I get the feeling you're basing this on the simplest use-case > > imaginable. The output plugin is about the format of the output not > > the content of the output. The sitemap benefits here allow for more > > complex processing (e.g. user profiling, smart content delivery, etc.) > > I disagree. The sitemap is a way of *configuring* this complex > processing, it is not the processing itself. The sitemap has become an > XML programming language and I hate it for that reason. > > Have you ever dived in to the implementation and tried to do anything > useful in there? Again, what implementation? I've looked inside to the Treeprocessor code in Cocoon, yes, and it is difficult to grasp. I did this when doing the LM mounting stuff to see how mountnodes were implemented in the sitemap - I like to think this was useful. I see no reason why the average user would care about this stuff though. > The fact that the sitemap had become a programming language is one > reason why Cocoon came up with the flow engine (e.g. to get rid of > actions). But if you use the flow engine then you are programming with > Javascript, it's only a small step from there to Java. So are there any > benefits in using Javascript over Java? > > In my opinion the answer is a resounding no, at least for our use case. > > >> Generate the output format > >> -------------------------- > >> > >> This is typically done by an XSLT transformation and/or by a third party > >> library (i.e. FOP) I have the same arguments here as I do for the > >> generation of internal format documents, in fact the parts of Cocoon we > >> use are identical in both cases. > > > > > > Yeah, output is just a transformer. Same thoughts as above. > > OK, back to aggregation since I argued earlier that it belongs here. > > Aggregation is nothing more than the collation of a number of resources > in response to a single request. It turns a single request to a number > of requests. Each individual request is handled just like any other > request. ASo what you have is a locationmap something like this: > > > > > > > Fair enough, move the aggregation to the Locationmap. This looks very similar to the sitemap though, no? > >> Caching > >> ------- > >> > >> Cocoons Caching mechanism is pretty good, but it has its limitations > >> within Forrest. In particular, we have discovered that the Locationmap > >> cannot be cached efficiently using the Cocoon mechanisms. > > > > > > This may be true. We had a novice working on LM caching at the time > > and I've learned quite a bit since then. I'd like to re-evaluate this > > before I'm willing to agree with with such a bold statement. > > This illustrates my point exactly. I looked at this too and also failed > to get a better solution. > > The reason I failed (and I guess the same for you) is that the code is > just so complex and jumbled that it's next to impossible to find ones > way around once one gets past the API. I've documented my challenges somewhere. It had to do with the timing of getCacheKey() and getValidity() for mounted maps I think - I'd have to go back and look. > >> This is now > >> one of the key bottlenecks in Forrest. > > > > > > Based on? I'd like to see this profiling data. Knowing that the LM > > is our way ahead I've been worried about squeezing every ounce where > > we could but I was still under the impression that it isn't a > > consequential performance bottleneck. > > Try building the Cocoon docs. Its set up on a Forrestbot in our zone. > Even when co-located on the same physical machine as the source for the > content it takes over 30 minutes to build. It really is a horrible solution. My question was really whether you confirmed that the locationmap is the reason for this slowness? I suspect it is not and, thus, not a "key bottleneck" in Forrest. > If you want to profile it then you can get the forrest site from the > Cocoon-Whiteboard. I'll take a look to see if it's really the Locationmap that's the culprit there. > This is an extreme example case, but one that is quite common in my > experience using Forrest to do real document processing (as opposed to > web site generation). I'm disagreeing with your conclusion - that the LM is a key contributor to the performance problems. I am not disagreeing with the performance problem itself. For example, I think a much larger contributing factor is that we re-generate everything for changes that really impact only a small part of a site. This has nothing to do with Cocoon baggage; we just have an implementation that isn't very efficient. > >> We could work with Cocoon on their caching mechanism but there seems > >> little interest in this since our use case here is quite unusual. Of > >> course, we can do the work ourselves and add it to Cocoon. But why not > >> use a cacheting mechanism more suited to our needs? > > > > > > So it's not 100% suitable so it's worthless? It fits in 98% percent > > of our needs so I don't see this as a compelling argument. > > That's unfair. I'm saying it is not perfect, therefore it is not > necessary to use it. I did not say it is not perfect so lets get rid of > it. Please take this in the context of all the other problems I am > highlighing rather than considering it as a single point. > > Besides it doesn't work for the locationmap, so in fact it is not used > in some of the processing of every single request we make. That's > considerably more than "2%" Yeah, it's baby-and-the-bathwater thing I think. I'd rather figure out how to solve our problem with the current cache mechanism than see this as a reason to re-implement all of Forrest. I'm just saying that of all the things that might motivate me to be involved in a re-implementation, this one doesn't strongly resonate with me. > >> Ready Made Transformations > >> -------------------------- > >> > > ... > > > You seem to be > > suggesting that Cocoon requires some big overhead to do transforms and > > that's simply not the case. > > That's right, I call 40Mb of bloat a fair big overhead for doing XSLT > transformations. > > This time I really am oversimplifying, but I hope you see my point - > certainly that is how my customers see it. As a result I ended up, in > most cases, writing a series of Java components that I wired together > manually and plugged directly into whatever framework they were using. > This RT is about doing this in a more felxible and reusable way. You're customers are likely just intimidated by the Cocoon-learning-curve itself rather then 40Mb of jar files. Many of the libraries would be needed regardless I think. Avalon would need to be replaced with another container that would likely be larger in size at least. batik, fop, jtidy, excalibur, etc, are all still needed. > >> This complexity makes it difficult for newcomers to get started in using > >> Forrest for anything other than basic XSLT transformations. > > > > ... > > > My point is that newcomers are > > going to find it difficult to deal with any framework that attempts to > > achieve anything beyond the simplistic. > > Yes, but if the framework is designed to do one job (publishing in our > case) then it is simpler to understand than if it is designed to do > every job (as with Cocoon). > > >> The end result is that we have only one type of user - those doing XSLT > >> transformations. > >> > >> Plugin Selection > >> ---------------- > >> > >> This is done through the sitemap. This is perhaps where the biggest > >> advantage of Cocoon in our context can be found. The sitemap is a really > >> flexible way of describing a processing path. > >> > >> However, it also provides loads of stuff we simply don't need when all > >> we are doing is transforming from one document structure to another. This > >> makes it complex to new users (although having our own sitemap > >> documentation would help here). > >> > >> Finally, as discussed in the previous section, we don't need a complex > >> pipeline definition for our processing, we just need to plug an input > >> plugin to an output plugin via our internal format and that is it. We > >> have no need for all the sitemap bells and whistles. > > > > > > I'm struggling to figure out what you think is forcing us into our > > current apparently overly complex solution. Is it the sitemap grammar > > that is complex? > > Not the grammar itself (although I do hate the fact that we are now > programming using the sitemap). The complexity is in processing of that > gramar whic results in the selection of the processing path to take. I don't understand. Treeprocessor? NodeBuilders? Matchers? > All we need to do is select the right plugins and make them work > together. Look at how many internal pipeline requests there are to do > this in Forrest now (its even worse if we use the dispatcher). > > This is overly complex for what is ultimately a couple of lookups. I'll hopefully find time later to look at your psuedo-code and maybe it'll make more sense to me. Right now, I'm just seeing what goes on as much more than a "couple lookups". > > Learning curves aside, I'd rather sit on top of a framework that > > supports a more complex solution than is my current problem because > > experience has shown me that the initial requirements grow and I don't > > want to have port when that growth happens. > > This is exactly why I hate "catch all" frameworks. They try to be all > things to all people. I prefer to use what I need now and look at > expanding things when I find a use case that requires it. How can you > know in advance that the framework you choose is going to be adequate > for the job in hand? How do you know you won't eed Struts, or Ruby On > Rails, or Wicket or SpringMVC or whatever? > > This is personal opinion and we should really leave it at the door. > Different people for different things. Our job is to decide what is best > for the project not for us as idividuals. I'll just leave you with one > though... > > If I'm going hiking I do not struggle carrying a family tent on my back > just because I may have some more children at some point in the future. Ok, we'll drop this line of thought as you suggest... > >> Conclusion > >> ---------- > >> > >> Cocoon does not, IMHO, bring enough benefits to outweigh the overhead of > >> using it. > >> > >> That overhead is: > >> > >> - bloat (all those jars we don't need) > > > > > > this is going to be addressed with maven (argghhh) and/or osgi someday > > - it's a recognized issue by many cocooners. > > "someday" is the optimal word there. I've been waiting too long. C'mon, you're an OS veteran here. Patches welcome, right? > If we reject this RT based on this argument then I want to see Forrest > developers helping Cocoon sort this out rather than standing by waiting > for it to happen. Ok, I threw the "maven" thing in with fingers crossed. I'd rather they go back to ant personally, maven is silly. I have a high-speed connection and it takes forever to download libs each time I *attempt* to build only to see it fail 10 minutes into it. Argghhh... > >> - complex code (think of your first attempt to write a transformer) > > > > > > I've never written a transformer. I suspect that I could do it in a > > day or less though depending upon the requirements. It's simply > > implementing XMLConsumer by handling SAX events, not that > > extraordinary for a SAX-stream-based framework. How do the many other > > pipeline frameworks do transforms if not by handling SAX events? > > Yes, transformers are simple. I should have picked non-trivial > generators as discussed above. Especially since this is a more common > requirement in the real world. That is we need input plugins to inteface > with existing corporate legacy code. > > >> - complex configuration (sitemap, locationmap, xconf) > > > > > > Like component managers nowadays, we've failed to strike a good > > balance between flexibility (configurability) and ease of use. > > I really can't agree with the "like component managers nowadays" part. > Have you actually worked with something like Spring? It is unbelievably > simple. > > >> - based on Avalon which is pretty much dead as a project > > > > > > They are at least partially migrated to Spring for management > > purposes. I understood that as a move to eventually migrate fully > > from Avalon to Spring. > > Don't be fooled by the "headlines". Look into the code. Until the Avalon > jars are gone then my point stands. Until someone here gets into the > Cocoon code and starts trying to disentangle things then my point stands. Until the Avalon jars are gone? That's not fair really. That's black and white and doesn't allow for a comprehension of progress. Let's take a look at the progress... final class AvalonServiceManager implements ServiceManager, BeanFactoryAware { protected BeanFactory beanFactory; public void setBeanFactory(BeanFactory beanFactory) throws BeansException { this.beanFactory = beanFactory; } public boolean hasService(String role) { return this.beanFactory.containsBean(role); } public Object lookup(String role) throws ServiceException { return this.beanFactory.getBean(role); } } Looks to me like the headlines were correct in this case. More or less a light wrapping around Spring. Spring is doing the heavy lifting behind the scenes. It's a whole lot of work to rip out the Avalon interfaces so I understand the desire to just wrap it for now. > Why don't I do that? I have other things to do, I need Forrest to be > useful, I don't use, and have never used, Cocoon independantly of > Forrest (at least not commercially). > > >> So Should We Re-Implement Forrest without Cocoon? > >> ================================================= > >> > >> In order to find an answer to this question lets consider how we might > >> re-implement Forrest without Cocoon: > >> > >> Locate the source document > >> -------------------------- > >> > >> We do this through the locationmap and can continue to do so. We would > >> need to write a new locationmap parser though. This would simply do the > >> following (note, no consideration of caching at this stage, but there > >> are a number of potential cache points in the pseudo code below): > > > > > > Assumes that matching and selection have already been implemented > > somewhere? > > Yes, the way I see it, regular expressions are pretty standard and well > supported. > > ... > > >> Generate the internal document > >> ------------------------------ > >> > >> Since the plugins are now loaded via a component manager our > >> transformation classes are POJO's that are independant of any particular > >> execution environemnt, therefore, there is no need to do anything > >> clever here. > > > > > > I don't understand. They need input/output contracts, right? There > > aren't standards defined for such things so it is execution > > environment dependent. The concept of a POJO is honestly really gray > > to me. I view Cocoon's transformation classes as POJO's. I've tried > > to grasp this POJO concept before and gotten lost. The Java community > > certainly has a knack for the creation of buzzwords with blurry > > meaning. > > I'm not really using POJO in the correct context here. All a plugin > needs is a method to do its stuff. This could be called "execute". The > input would be a SAX stream (for which there are multiple standard > implementations), the output would also be a sax stream. > > There is no dependency on anything else. Even the container manager in > use would be independant from the plugins and could be replaced at any time. Again, strictly talking about the components, what you describe above as a "plugin" is an implementation of XMLProducer and XMLConsumer. I'm not seeing the benefit/difference but don't waste time on responding until I actually put the effort into looking at your psuedo-code. > >> So is this interesting or not? > > > > > > Not so far... I'm not convinced. I think you're implicitly > > describing an oversimplified use-case, overstating the complexity of > > Cocoon, and glossing over what we get from Cocoon. More to come... > > Tim, you have argued against my points, are there none that you see > merit in? It would be helpful if you could highlight any points that you > feel are valid, even just by saying "yes, OK". This will enable us to > pull the good stuff out of this thread and to let the bad stuff just rot > away. Fair enough, I'll try to do this when I respond tonight to the other half of your first mail. --tim