Return-Path: Delivered-To: apmail-xml-forrest-dev-archive@xml.apache.org Received: (qmail 36760 invoked by uid 500); 19 Aug 2002 12:20:22 -0000 Mailing-List: contact forrest-dev-help@xml.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: Reply-To: forrest-dev@xml.apache.org Delivered-To: mailing list forrest-dev@xml.apache.org Received: (qmail 36751 invoked from network); 19 Aug 2002 12:20:21 -0000 Received: from www2.kc.aoindustries.com (209.15.201.84) by daedalus.apache.org with SMTP; 19 Aug 2002 12:20:21 -0000 Received: from ROSINANTE (stat88-15.adsl.xs4all.be [195.144.88.15]) by www2.kc.aoindustries.com (8.11.6/8.11.0) with SMTP id g7JCKHw25086 for ; Mon, 19 Aug 2002 07:20:18 -0500 From: "Marc Portier" To: "Forrest-Dev@Xml. Apache. Org" Subject: Subject: [proposal] Design for build(s) and related. Date: Mon, 19 Aug 2002 14:20:16 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Status: O X-Status: X-Keywords: Hi all, I've been promising some actual work (build refactoring, libre redesign, ...) and I recently haven't come any further than submitting a documentation change of a single & into a % character. Although not visible to you I have been spending quite some time thinking on all of this... ...my brain keeps running circles over a number of topics, some of them have been half-expressed in threads here and there, I needed this round-up at least for my own sanity, maybe it helps your mental health as well :-) I would like to be as productive with your time as I possibly can, yet I think some random thoughts need to be expressed to make myself (totally un-) clear. Bottom line is I'm not certain enough on these things to call on votes, at this stage opinions and feedback would be great. My observations and motivations are honest but my knowledge is limited, so please help out in the sections where I'm just plain wrong or incomplete. Challenging is great, offering corrections and other solutions is even better. Be welcomed to yet another big fit-in exercise. -o0o- [opinion: on the end of meaning of filename-extensions] What is XML doing to our filename extensions? It is making them damn useless, that is what! Since everyone has become so fond of this XML thing the only file-extension en vogue seems to be .xml, and no matter what the file is about someone decided that it should open in ie6 when I double-click it. In fact that is not too bad, since every now and then that _is_ quite satisfactory in terms of getting a view on what is in there. On the edit side of things it's a bit different: I'm using a number of different XML editors and all of them tend to be used for different doctypes. - Pollo for coco-xmap, ant-build.xml,.. - Xmetal for the document writing ones: ot-presentation-ware, xdocs, xhtml - gvim for the short ones and/or when the change is a quick hack (or I don't want to have my editor do whitespace rewritings that mess-up cvs diffs) - XMLSpy for the tabular-like DTDs - Excelon Stylus for the none doctyped ones that get defined as I write them - Intellij 3.0 ea for the ones that somewhat relate to Java-code working on them (the convenience of having them in the same environment) In fact somebody should write some small app that gets configured to startup when I double-click *.xml and based on reading the doctype and some config file decides on launching the correct app. Wouldn't that be neat? (naah really neat would be the content-management filesystem that does this out of the box) Well my premise is not really correct; there still is some filename extension diversification on my hard-disk: [1] There is the whole lot of increasingly less interesting ones I hope to see banned before the end of my lifetime: data files that are (so called) 'managed' by a specific application. The truth often is that that app 'stole' my data, and I politely have to ask that app if I can please, see, edit, print, search... my own information. *.xls, *.doc, *.whachamacallit Let us forget about these in terms of editable things, instead (with POI) these could become one of: [2] There is a fair amount of view-type file extensions: *.jpeg, *.gif, *.html, *.pdf, *.the-media-types that are to be considered as consummation only types: we should not edit its. The extension is a hint for choosing the decoder/viewer to actually get some consumable format, either spots on paper or pixels on screen, the last ones could be moving, and of course other senses could be triggered (audio) [3] Archiving file extensions: *.arj, *.tar.gz, *.zip which feels like an odd type of directory/folder (which by some quirk in history are most often seen extension-less) rather then one of the previously mentioned view-types. (although: self-extracting executables, or archives of e.g. a full html-site with index.html starting point and in-lined images could be seen as a view/consummation type of thing? If only all of these would have some manifest file in there describing their meaning of existence... heck that _is_ the same for directories) [4] The xml-world by itself seems to keep a nicer extension-based distinction between the formats it is bringing to life itself: *.xsl, *.dtd, *.xsd, *.fo, *.svg,... looking at the catch-all *.xml, this practice seems to be somewhat contradictory, no? (except for the noon-xml DTDs of course) To the unknown *.svg really doesn't look like it is written down in xml. [conclusion] What I'd like to conclude at this stage is that the concept we all know as the file-extension: - Is overused for catching different aspects of really what_its_for, what_its_about, how_its_encoded, ... - Has lost meaning in the catchall *.xml case. (this last remark equally holds for the MIME-TYPE text/xml, is svg supposed to be communicated as such? why (not)? when?) (I googled for this: there seems to be image/svg+xml, image/svg-xml and text/svg+xml, does anyone actually know?) [proposal 1] Let us use hierarchical filename extensions for our source documents. They should be capable of holding the different aspects in distinct and recognizable ways. At this stage, I think we can pull it off with two steps in the extension: The pattern then becomes [naming-and-addressing-part].[document-type-part].[content-c onsummation-part] here the document-type-part surely reflects the !DOCTYPE in XML files examples: *.faq.*, *.doc.*, *.howto.*, *.build.* while the content-consumimation-part should provide a hint to the encoding and thus viewer to use to visualize its content. examples: *.*.xml (native structure format), *.*.html, *.*.pdf, *.*.jpg (this is pretty much the filesystem substitute for what the HTTP header Content-Type is doing, just didn't want to call it content-type-part, as it is possibly confusing towards document-type) completed we get examples like: what_is_forrest.faq.xml: src xml file (native structure format) of a faq about what forrest is forrest_contract.doc.pdf: pdf version of a document on the contract between forrest and its users cvs-ssh.howto.html: html version of a howto on using cvs over ssh MyFutureClass.xjava.html: html version (probably syntax highlight) of an original Java source file (this suggests that there was some other process that generated the marked up xjava version out og MyFutureClass.java) MyFutureClass.xjdoc.pdf: pdf version of some xdoclet produced javadoc for the same class. (comparable suggestion on xjdoc generation) What about unstructured (historical) documents? examples: mpo.jpeg, index.html Dunno, I guess they are, so let them? Forrest should just 'read' those, not work on them anyway. Could we drop the *.xml? As in the *.xsl en *.xsd examples we could decide on dropping the *.xml all together. Just using one extension part would then assume the .xml suffix. However we risk loosing the clear distinction with unstructured/historical files. Having only one extension part should mean: they don't know about using two. The thin line between document-type and content-consummation could be hard to make: Are xhtml and svg (to mention only two) document-type describing or rather content-consummation describing? Maybe we don't need to decide that in general, but instead let the use in practice just define how it was intended (by author/publisher) in each particular case? Next to that I don't see why there would be no room for something like testresult.metric.svg.jpg? Looking at it from this angle the multiple parts of the new extension become like a route or a trail describing how to get from *.metric.xml to jpg via svg. (Supposing there could be more then one route) Just being practical: (since we generate static sites) would anyone know if this line of thinking can be applied to filenames on CD-ROMS? -o0o- [opinion: on the units of content management: the file and the directory ] The bags we call directories on the filesystem are under the control of the content creators. They decide upon using them primarily for their management activities. It is the typical unit on which they: - set rw permissions - control ownership - archive, backup, restore, move - are able to have cvs subsection actions (be it commit, update, diff...) These concerns therefore can pragmatically take over on the concern to group all files that address a common subject. (we hope that both will not be in collision) To group all documents of a given document-type is normally only the last concern to get any attention in the play. (and that is okay IMHO) [conclusion] I'd like to observe that: We are likely to find at all times different types of documents mingled in one directory. Even if some of them will always (but never say never) be put into separate directories. Reversed: a directory (name) can not be used to identify the various types of documents the contained files are able to carry. [proposal 2] Let us avoid using the leading part of path-identifiers (and URIs) to have anything to do with 'type' of documents. [proposal 2bis] And as an on-the-side, minor proposal: let us again put images that are only in-lined in one document, with that document. (Having a central image-bank is another thing, should be considered, but is not what we are currently seeing a need for) -o0o- [opinion: on separating all concerns up to the level that we need to express them all inside ONE (too short) URI] In the web-world according to Cocoon, incoming requests drive everything. This everything is far more than merely where that resource is stored on the local hard drive. Careful design of the URI request space needs to reserve space in that URI string to express the different aspects of how the content is to be retrieved, presented, and encoded. The different aspects I see: - still find the resource-file to start from - be able to select the correct pipeline. that pipeline becomes the one thing that is capable of (1) producing the output-format as promised by the URI link (find the pdf here) (2) starting from the document-type we are finding - decide on the additions to the document (the things it should be aggregated with) being the navigation view (a meaning for tab?) to apply to it, and the chosen skin - other generation customizations (run-time formatting parameters) we haven't encountered just yet. Some of these aspects (like the skin is today) can be fixed for the whole site being generated. Generating static site versions means we cannot express any of these with ?param=value additions The normal path-like remainder allows for naturally organizing hierarchical aspects (aspects that have some super-sub or containment relation to each other) In this case however we will be forced to see it as just position-sequential parts, which means we will need to just countdown and give the different aspects a position in the URI. The worst thing about this is that the resulting scheme can possibly break when we need to consider new aspects in the scheme in the future. (see http://www.w3.org/Provider/Style/URI.html) So we need to think ahead now, or need some smart idea for making it extensible. [conclusion] Given the fact of the static generation (fact being that the result needs to be stored and published as is on a dumb webserver) the forrest URI request space needs to map an actual file and directory layout (could still largely be different then the src docs layout of course). This poses some extra constraints on which parts of the URI we can use. [proposal 3] The proposal for having the from-type-to-type-trail file extensions can be reused inside URIs as well. They will help us select the pipeline to apply AND they will (more or less) double-check that the input-type for the pipeline indeed maps all the actions (transformations) we are going to run over it. This allows to have the pipeline matchers work rather on the trailing part of the incoming URI. Leaving us with the leading part to grasp the rest. That rest .... ??? is it more than finding the actual docs on the files? (oh I hope it is not) ??? what are the tabs doing? ??? what decides in which tab you are living? -o0o- [opinion: on inversion of control] Well, I was preparing this... and then: this guy said it all a lot better: http://www.webweavertech.com/jefft/weblog/archives/000027.ht ml The most used build-target for the forrest project (apart from clean) must be docs. My feeling is that it passes by at really telling people what forrest is about. I know we might be struggling to pin down what it _is_ about, but we could easily agree that it is _not_ about having all our users publish the http://outerthought.net/forrest/ (or) http://krysalis.org/forrest/ equivalent on their websites. The 'docs' target to me is performing the functional equivalent of 'testing' what a 'built' of forrest can do. That we do it on our own documentation stuff makes all the sense of the world. We are as good a case as anyone else's project to start from, but we should stop (to let our build file) pretend it is the *only* case. [conclusion] Our build system has no clear thing to build, and thus lacks some visible production to be reused. [proposal 4] Solving this very issue has been living under the umbrella 'refactoring build.xml with the ideas of the forrestbot' Actually doing it however reveals more things that are not right - with the bot (how it could be both extended and reused --> further reading) - with the existential thing forrest should become: And that is: an ant task. However that is a long way off (I know Nicola is working on one, but I would like to call his vision the cocoon-generator task since it is focusing on that point currently) In fact it could be challenged at this stage that it is feasible at all to cramp all that complexity in a task. The great thing however is that Nicola is offering us centipede... There is a lot of great things to read about it over at http://krysalis.org (I'm still catching on) but I would like to present it to you now as way to call cents. Cents being complex ant-like tasks, that (and this is great) can be packaged as separate projects almost: with their own resources in files and directories and an actual build.xml (called the xbuild.xml) (Of course cent.build.xml would of have been a far better name :-)) The consequence of this is that we give control back to the projects that use forrest. So in good IoC-tradition (Hollywood principle: don't call us, we'll call you) we end up waiting in the ant chain of dependencies to get called. This rather then BE the template build.xml and directory structure that is forced upon your project. Looking at the interface of this forrest-cent we will need a complex set of arguments that parameterize the site-assembly process. It will be best to catch those in some configuration file (I even think centipede proposes such thing: properties.xml?) Where the generated site needs to be put on hard disk will surely be another thing we need to be informed about. For those that do not like the centipede dependency (they exist) we should foresee also a means to package up forrest in different incarnations (more useful targets). What comes to mind is: - a full independent bin distribution one needs to install, then set FORREST_HOME and call forrest.sh|bat to have it do its work --> this would be typical for forrestbot setups on servers that unlike workstations would not even have ant in place or such - a maven plugin (hoping Jeff Turner stays around) - a nicely handmade and standalone Java program? leading to: - as said the ant task and thus the jar to add to the $ANT_HOME/lib [proposal 4bis] On-the-side, the proposal on a more organizational level would be to more and more actively promote forrest, and support our users with less excuses (http://marc.theaimsgroup.com/?l=forrest-dev&m=1029504458026 65&w=2) and more easy applicable and working solutions. Oh yes, less docos like this as well probably :-) -o0o- [opinion: on files we don't know about.] Currently forrest is supporting a handful DTDs (faq, howto, xdoc, status, changes, dtd...) and stands (paralyzed?) at the dawn of thinking how to ever work on all the other stuff (maven is doing) typically needed in project sites. - junit test reports - java source marked up for syntax highlighting - javadoc pages - mail archives - ... Some of these will be dynamically pulled through a generator, some of them however will need to be prebuild by another process (ant task, cent or other) (javadoc is a great example of this: you _need_ to produce the whole set in one run) Also, inside projects that want to use forrest as their documentation and site publishing thingy people might have been using different standards (oh no, here comes the next docbook discussion thread) The two-fold challenge/question is 1. Where to put/find these files - with the purpose of picking them up in the site generation process - with the intent to cross reference them from other parts of the documentation system (including the navigation, be it via book or libre) 2. and how to augment the sitemap with possibly required new pipelines (e.g. to use the new your-type2xdoc.xsl) [conclusion] Forrest will need to work on files we don't know about. We need to provide some mechanism to extend - the number of document types for which we provide support. (new pipelines?) - the location of these babes in a way they can be referenced (from documentation, or navigation) While doing the exercise we should consider making it such that our users can use it as well. For most of them building a mount-sitemap will require a learning-investment that is not justified by what they get out of it. (They would rather ant-style allong if you ask me) [proposal 5] Restating the multipart file-extension solution is getting a bit overly satisfied with my own thinking of course... (btw Steven was the first using it for the *.dtdx.*, I just happened to like the elegance of how that was added to forrest) The additional thing at this stage would be to let forrest-using projects declare which other document-types they want to throw in. Some configuration file that allows them to express which document types they use, how to recognize them using the file-extension, which public identifier they propose, where the DTD and the appropriate *2xdoc.xsl is. Typical snip of configurating xml could be: And maybe even for non XML docos that requires/provide a cocoon generator to start with: Out of this the actual sitemap (parts of it) could be generated. Equally the catalog file could be rendered. Both placed at predefined locations As stated before: forrest internally should use the same mechanism, maybe that could double as the pre-packaged set of core document-types so people don't need to do that work again. (sure enough, the quality of our current set of DTDs is what is drawing and keeping our current user base, so let us leave room for that appreciation in the future) As for the problem of where to put (and how to reference) these thrown-in files... there should just be a list of sections we could link to, in combination with some aliassing scheme and a config that explains where these files are inside your project (at the time you call the forrest cent task) Something along the lines of could do the trick. Combined with some generated ant copy-tasks (a bit of the way forrestbot works) that move the described parts to the cocoon context directory (into the position the @ref is proposing). FYI: The cocoon context directory is what the off-line generation process is using as a starting point to find sitemap, content, stylesheets,....) The next thing we need is an addressing scheme based on this to allow cross references between e.g. handwritten xdocs and generated stuff. That could be done with simply using the link href="/junit/..." off course. This approach however assumes that the same person writing the href in the content can control the mentioned configuration fragment (and vice versa). In cases where the hrefs could even be generated by another tool, that assumption only gets more optimistic (less likely) Next to this, to complete: - Relative links don't start with a / and remain inside one content-part. - The navigation building files (book.xml?) should use the same scheme. In every case augmenting to would catch the idea of a content-part claiming more of the reference-space to be pointing to themselves. Again, generating matchers (i.e. parts of the sitemap) out of this that redirect to the reference would make sure the correct document is retrieved. On the other hand using this information inside a smart transformer that gets applied just before the skinning would be even better, since that would reduce the number of possibly duplicated copies in the generated site (specially bad for generated pdfs) This beast should be on the look-out of href attributes and replace found link-aliases with their actual ref. [proposal 5bis] On-the-side minor proposal added here since the recent xhtml thread: just maybe there should be some soc-consideration between xdocs and xhtml2. While xdocs seems to be a nice way to write down relatively simple documentation, use of (a subset of) xhtml2 as the intermediate (just before skinning) looks to me like not such a bad choice. I understand that it needs more investigation and the docuheads on the list could help us out a great deal... it is just my feeling that it would be helpful to other people if they could re-use some ready my-type2xhtml rather then rewrite that towards my-types2xdoc format. So the or-or question to me could get an and-and answer: different purpose, different document type? -o0o- [opinion: on cross references and linkmaps.] In fact just thinking about this opens memories of something like the first cocoon-dev message I've seen (that I remember about at least: http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=98379467820 862& =2) Rereading it now again (I was hoping for hints), I realize that I've probably never understood this in the first place. :-) ("Only two things are infinite, the universe and human stupidity, and I'm not sure about the former." -Albert Einstein ) So please indulge my ignorance... The sitemap is a great gift. But it is not solving world famine. It manages the distribution of incoming requests, but (irreversibly) it fails to produce the map off all available resources it is managing! Put in other words: the URI that would fit the common (i.e. not the cocoon connotation) web-description 'SiteMap' would never be based on the sitemap.xmap. (This pretty much in the same way that website-authorization files do a good job in blocking requests that are not allowed, but leave it to other systems to produce a navigation and set of cross-refs that only contain links to files you have access to.) [conclusion] Navigation should not try to reverse the information in the sitemap, because it will not succeed. In the overall solution however, 'Navigation' (cross references) must be able to (via the end user) close the loop the sitemap has opened: Sitemap points from URI to content. Content wants to link back to URI. [proposal 6] I would like to propose the forrest SitePlan. (it is in this early stage probably rather incomplete, but hopefully this gets us going.) This just grabs together what we already had going... At the heart of forrest there should be one of these as well. Projects using forrest can slide in their own to augment/override the settings in there. This project specific siteplan should be joined with the one from forrest-core that serves as a fallback and an example. >From this file the catalog file gets generated. >From this file the sitemap (xslt task) gets generated. >From this file a temporary ant build file is generated. This file is input to the forrest-cent. This file is picked up by the future forrestbot as well... -o0o- [opinion: on the public interface of build files] To let the forrestbot step in just like that, we are however missing some more information. Basically this lack of intelligence (CIA meaning of the word) is a side effect of the cent-approach: Since we inversed control (see one of the previous topics) forrest is no longer on top of things: we don't control all details of the build-file any more! (because we want to hide those, and thus not be put into the user's project build file) However, the siteplan introduced the notion of different content-parts that need to be moved into the cocoon-context directory before the site generation takes place. Some of the files in those content-parts might of been generated by other tasks (javadoc example). Which one of those tasks we depend upon is unclear to the outside forrest-bot since it is the project ant-file that controls that. Unless we parse the build-file? (that would be spying around, uh) No not even. Parsing alone will not help you, additionally you will need to understand the various internals of the called tasks to really know where the stuff you need is being put by the various tasks. This is because ant-build-files (when considered as being objects our components) have a public interface consisting of only voids. To the outside world they list a number of targets (methods) to call. These targets are somewhat virtual names relating to actual real-life 'productions' that are made by the enclosed tasks. (Setting a check-property is of course an equally virtual 'production', but you catch my drift, right?) With the 'being void'-statement I draw attention to the fact that the generated 'productions' are but hidden side-effects, only known to the implementer of the ant-build file. (Like after you have called an ant target for some new project, you always have to guess where in the ./build (or was it ./dist?) you can start looking around for something you recognize/expect/hope for?) I lack the experience and insight in the bigger running of things in the jakarta and the apache world, but I would love to see this talked and discussed about in a bigger forum. Taking this up on the forrest level seems to be not fitting the bigger scope of it. The very basic view I currently have is for ant files to be able to list in their -projecthelp actually where which productions are to be found. This could be based on the fact that the would have some means of expressing that. (Note that there could be more then one production per target, that productions could be root-like-directories, and that one could still choose not to 'return' == 'make known to public' all of them). Additionally calling ant at the commandline (or via the task of course could maybe have an option to specify where (some of the) productions (by name?) should be placed (or copied to) This would allow for - bots like ours to be written once and for all: just know which ant task to call on the project, and know where the result is put. That reduces the workstages to the abstract 3: let the bot get the src, call that specific target, deploy the result it generated. Done. The forrest-bot becomes an ant-bot since it will work for anything that is ant-able. - also gump like integration-meta-projects could be defined in terms of a meta-build file (simple ant again) that expresses which targets on which projects to call and be source to whatever target in the other project... - finally ant-based-installation (like the acorn patch from Jef Turner showed) could be based on a general, reusable idiom here. oh, well... [conclusion] Ant targets are voids with noticeable side-effects (known to the implementer) The actual productions however are not 'returned'. As a result tasks on the outside of the build script may depend on them, but cannot get a hold of them. (unless you open the ant file and track down where your 'production' was put.) For the outside forrest-bot this has the net effect that we need to capture the knowledge of the dependent tasks inside the project. [proposal 7] It is a bit avoiding the discussion at this stage, but pulling some of it back into forrest arena the siteplan could probably list the ant targets we depend on (since it was already listing that required output-location) It does create some bad vibes around duplicating the knowledge inside the ant file (the @depends of the target that calls the forrest-cent) and thus opens ambiguity for users that would expect also the local forrest target to be automatically calling the tasks? So they would omit the @depends expecting forrest to do that. -o0o- We're in a pursuit for meaning, something that in any possible way will lead to solutions of the 'Turtles all the way down'-type: http://andstuff.org/TurtlesAllTheWayDown In this case it will be more and more dots in filenames :-) "Would you tell me, please, which way I ought to go from here?" "That depends a good deal on where you want to get to," said the Cat. "I don't much care where --" said Alice. "Then it doesn't much matter which way you go," said the Cat. "--- so long as I get somewhere," Alice added as an explanation. "Oh, you're sure to do that," said the Cat, "if only you walk long enough." (Lewis Carol, Alice In Wonderland) Thanks for walking up to here. Unresolved issues I see: - what are tabs (and could they be requiring a prefix in our URL-space?) - use cases for generating navigation-tree-structures for the site. (libre is to be seen as a way to generate those based on rules) -marc= -- Marc Portier http://outerthought.org/ Outerthought - Open Source, Java & XML Competence Support Center mpo@outerthought.org mpo@apache.org