forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Ken Barozzi <nicola...@apache.org>
Subject Re: Cocoon CLI - how to generate the whole site (Re: The Mythical Javadoc generator (Re: Conflict resolution))
Date Mon, 16 Dec 2002 13:01:52 GMT


Jeff Turner wrote:
> On Mon, Dec 16, 2002 at 08:59:32AM +0100, Nicola Ken Barozzi wrote:
> 
>>Jeff Turner wrote:
> 
> ...
> 
>>>>We've established that Cocoon is not going to be invoking Javadoc.  That
>>>>means that the user could generate the Javadocs _after_ they generate the
>>>>Cocoon docs.
>>>>
>>>>To handle this possibility, the only course of action is to ignore links
>>>>to external directories like Javadocs.  What alternative is there?
>>
>>Yes, but I don't want this to happen, as I said in other mails.
>>The fact is that for every URI sub-space we take away from Cocoon, we 
>>should have something that manages it for Cocoon, and that's for *all* 
>>the environments Cocoon has to offer, because Forrest is made to run in 
>>all of them.
> 
> 
> Ah, gotcha :)

Pfew, it took a long time didn't it?

> Though remember, with the file: patch, the sitemap *did* serve up files,
> through this rule:
> 
> <map:match pattern="**">
>   <map:act type="resource-exists">
>     <map:parameter name="url" value="content/{1}"/>
>     <map:read src="content/{../1}"/>
>   </map:act>
> 
> So it worked in both command-line and webapp.  The command-line solution
> just happened to bypass the Cocoon CLI.

Which is the point :-)

> The file: patch has two effects:
> 
>  - Introduce schemes in xdocs, starting with a 'file:' scheme.  I think
>    that schemes in general are uncontroversial.  When linkmaps arrive,
>    90% of links are going to be linkmap links, so having a scheme prefix
>    should be the norm. 

I'm totally for the scheme concept. But schemes are IMHV onlt link 
rewriting rules, and should not address other concerns.
A file: scheme would not do any rewriting, so I don't see the need ATM.

>  - Routes around a CLI bug, by copying static files with Ant, rather than
>    through the CLI.

Yup, that's the major point that I didn't like.

> What we really need to agree on is the first point; whether we want to
> prefix static links with 'file:'.  When xdocs are swarming with linkmap:,
> java:, person:, mail:, etc links, why not have file:?  Conversely, if we
> want to "infer" the file: scheme, are we going to try to infer all the
> other schemes?

Hmmm, I don't see the big problem here, but I may as well be wrong.

The schemes are link-rewriting systems. Why would we need to rewrite 
"file:"s? Remember that to get a specific type of "view" on the file we 
have the mime-type attribute in links.

>>If we had a CLI-only Forrest, I could say ok, let's do it, let's make 
>>Ant handle that, but I don't want to see different "special cases" of 
>>handling these spaces. Your proposal has IMHO the same drawbacks as it 
>>had before nevertheless.
> 
> Yes I see.  It hacks around a CLI bug, and introduces a mechanism by
> which further potentially-hack-requiring schemes (like java:) could be
> implemented.

I'm quite confident that we won't use "hack-requiring schemes".
At least that's my goal.

>>>>One thing we could do, is record all 'unprocessable' links in an external
>>>>file, and then the Ant script responsible for invoking Cocoon can look at
>>>>that, and ensure that the links won't break.  For example, say Cocoon
>>>>encounters an unprocessable 'java:org.apache.foo' link.  Cocoon records
>>>>that in unprocessed-files.txt, and otherwise ignore it.  Then, after the
>>>><java> task has finished running Cocoon, an Ant task examines
>>>>unprocessed-files.txt, and if any java: links are recorded, it invokes a
>>>>Javadoc task.
>>>>
>>>>So we have a kind of loose coupling between Cocoon and other doc
>>>>generators.  Cocoon isn't _responsible_ for generating Javadocs, but it
>>>>can _cause_ Javadocs to be generated, by recording that fact that it
>>>>encountered a java: link and couldn't handle it.
>>
>>Hmmm... this idea is somewhat new... the problem is that it breaks down 
>>with the Cocoon webapp.
> 
> It doesn't break down.  It makes the CLI solution independent of the
> webapp solution.  In the case of file:, the webapp happened to have
> solved the problem.
> 
> 
>>My point is IMHO simple: if the webapp Cocoon can handle it, the CLI 
>>should similarly handle it. No special cases. If Cocoon has to trigger 
>>some outer system, we already have Generators, Transformers, Actions, 
>>etc, no need to create another system that BTW bypasses all Cocoon 
>>environment abstractions.
> 
> 
> Yes, that's the ideal.
> 
> 
>>IMHO, Cocoon is the last step, the publishing step. This is the only way 
>>I see to keep consistency between the different Cocoon running modes. 
>>Hence I don't think that triggereing actions after Cocoon CLI is going 
>>to solve problems, but instead created more since it breaks the sitemap.
> 
> Not break, just doesn't solve the problem with the same mechanism.
> Remember we only have two 'running modes': webapp and CLI.

Not for long. Gianugo is probably gonna work on a EJB environment soon, 
we have an Any one in the works, and in the future an 
Avalon-native-component version.

>>You say that the webapp is the primary Cocoon-Forrest method, and as you 
>>know I agree. the CLI is just a way of recreating the same 
>>user-experience by acting as a user that clicks on all links.
>>
>>BUT the user doesn't necessarily work like this, the user can also type 
>>in a URL in the address filed, even if it's not linked, but CLI won't 
>>generate this.
>>Why?
>>Because Cocoon is not an invertible function. That means that given 
>>sources and a sitemap, we *cannot* create all the possible positive 
>>requests. Which in turn means that the Cocoon CLI will never be able to 
>>create a fully equivalent site as the webapp.
>>
>>So we should acknowledge that we need a mechanism that given some rules, 
>>can reasonably create an equivalent site. Crawling is it, and it 
>>generally works well, since usually sites need to be linked from a 
>>homepage to be accessed. Site usage goes through navigation, ie links.
>>
>>Now, Cocoon is not invertible, and this is IMHO a fact. But *parts* of 
>>the sitemap *are* invertible. These parts are basically those where a 
>>complete URI sub-space is mapped to a specific pipeline, and when no 
>>parts of it have been matched before.
>>
>>
>>    <map:match pattern="sub/URI/space/**">
>>       ...
>>    </map:match>
>>
>>
>>This means that we can safely invert Cocoon here, and look at the 
>>sources to know what the result will look like.
>>
>>Conceptually, this gives me the theorical possibility of doing CLI 
>>optimizations for crawling without changing the Cocoon usage patterns. 
>>It's an optimizations inside the CLI, and nothing outside changes.
> 
> Yes!  Today's Mr Clever Award goes to Nicola, for working all this out
> and presenting it so clearly :)
> 
> So really, the CLI could short-cut any URI served with <map:read>.

Not exactly. Also non-reads can be dealt this way. It's not the read 
part that it short-cuts, but the URI space handling.
IE, if a pipeline handles all the URI space, it can safely invert that 
*match* (not the pipeline). See below.

> The "how to invert a sitemap" question also pops up when trying to
> auto-generate a linkmap (specifically, link targets), so a general
> solution (insofar as one is possible) would be very useful.
> 
> One thing I don't see: how does the CLI know that when one Javadoc file
> is referenced, it must copy all of them across?  Remember, you stripped
> the 'java:' scheme in step 1.

Actually, it simply would not crawl that URI space.

This is how it could do it as a start:

1) get all the "matches" in the sitemap; attention must be put in nested 
matches.

2) the ones ending in ** are to be taken into account.

3) for each of those matches, it inverts the match and is able to "map" 
the source and output spaces. Basically it scans all the subdirs defined 
in the match, gathers all the filenames, rewrites them as URIs using the 
inverted match, and calls cocoon on them one by one.

3b) [secong optimization] *If* the pipeline is a read, it can simply 
copy the files across and change filemanes according to the inverted 
match rule.

4) then it can start crawling the docs, remembering not to follow links 
in the spaces already generated.

In essence, we are able to not use crawling to generate parts of a 
website, so it's done much faster.

>>Now, since the theory is solved, the question slides to how to do it, 
>>especially because the pattern can have sitemap variable substitutions 
>>in it.
> 
> So we have two options:
> 
> 1) Implement a sitemap inverter, use it to create a 'lookup table' of
> shortcuttable URIs, and then integrate this into the CLI.
> 2) Say "life's too short, let's just copy the files with Ant".
> 
> Now, practically, solution 1) is going to take a _long_ time to be
> developed.  If it comes down to me, it will be developed when the linkmap
> needs it.
> 
> So, given that 2) is dead simple and 90% implemented, how about going
> with it for now, and replacing it with 1) when that arrives?  As long as
> the public interface (link syntax) is maintained, we can switch
> implementations without affecting users.

Let's define then the syntax. I don't see the need for a "file:" scheme, 
let's argue on this then.

As for individual files, we should be able to fix it by using a 
MimeTypeAction that defines the actual mime-type of the file and/or 
fixing CLI so that it doesn't append the html to unknown mimetype stuff.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------


Mime
View raw message