forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Turner <je...@apache.org>
Subject Re: Cocoon CLI - how to generate the whole site (Re: The Mythical Javadoc generator (Re: Conflict resolution))
Date Mon, 16 Dec 2002 12:03:09 GMT
On Mon, Dec 16, 2002 at 08:59:32AM +0100, Nicola Ken Barozzi wrote:
> 
> Jeff Turner wrote:
...
> >>We've established that Cocoon is not going to be invoking Javadoc.  That
> >>means that the user could generate the Javadocs _after_ they generate the
> >>Cocoon docs.
> >>
> >>To handle this possibility, the only course of action is to ignore links
> >>to external directories like Javadocs.  What alternative is there?
> 
> Yes, but I don't want this to happen, as I said in other mails.
> The fact is that for every URI sub-space we take away from Cocoon, we 
> should have something that manages it for Cocoon, and that's for *all* 
> the environments Cocoon has to offer, because Forrest is made to run in 
> all of them.

Ah, gotcha :)

Though remember, with the file: patch, the sitemap *did* serve up files,
through this rule:

<map:match pattern="**">
  <map:act type="resource-exists">
    <map:parameter name="url" value="content/{1}"/>
    <map:read src="content/{../1}"/>
  </map:act>

So it worked in both command-line and webapp.  The command-line solution
just happened to bypass the Cocoon CLI.

The file: patch has two effects:

 - Introduce schemes in xdocs, starting with a 'file:' scheme.  I think
   that schemes in general are uncontroversial.  When linkmaps arrive,
   90% of links are going to be linkmap links, so having a scheme prefix
   should be the norm. 

 - Routes around a CLI bug, by copying static files with Ant, rather than
   through the CLI.
  
What we really need to agree on is the first point; whether we want to
prefix static links with 'file:'.  When xdocs are swarming with linkmap:,
java:, person:, mail:, etc links, why not have file:?  Conversely, if we
want to "infer" the file: scheme, are we going to try to infer all the
other schemes?

> If we had a CLI-only Forrest, I could say ok, let's do it, let's make 
> Ant handle that, but I don't want to see different "special cases" of 
> handling these spaces. Your proposal has IMHO the same drawbacks as it 
> had before nevertheless.

Yes I see.  It hacks around a CLI bug, and introduces a mechanism by
which further potentially-hack-requiring schemes (like java:) could be
implemented.

> >>One thing we could do, is record all 'unprocessable' links in an external
> >>file, and then the Ant script responsible for invoking Cocoon can look at
> >>that, and ensure that the links won't break.  For example, say Cocoon
> >>encounters an unprocessable 'java:org.apache.foo' link.  Cocoon records
> >>that in unprocessed-files.txt, and otherwise ignore it.  Then, after the
> >><java> task has finished running Cocoon, an Ant task examines
> >>unprocessed-files.txt, and if any java: links are recorded, it invokes a
> >>Javadoc task.
> >>
> >>So we have a kind of loose coupling between Cocoon and other doc
> >>generators.  Cocoon isn't _responsible_ for generating Javadocs, but it
> >>can _cause_ Javadocs to be generated, by recording that fact that it
> >>encountered a java: link and couldn't handle it.
> 
> Hmmm... this idea is somewhat new... the problem is that it breaks down 
> with the Cocoon webapp.

It doesn't break down.  It makes the CLI solution independent of the
webapp solution.  In the case of file:, the webapp happened to have
solved the problem.

> My point is IMHO simple: if the webapp Cocoon can handle it, the CLI 
> should similarly handle it. No special cases. If Cocoon has to trigger 
> some outer system, we already have Generators, Transformers, Actions, 
> etc, no need to create another system that BTW bypasses all Cocoon 
> environment abstractions.

Yes, that's the ideal.

> IMHO, Cocoon is the last step, the publishing step. This is the only way 
> I see to keep consistency between the different Cocoon running modes. 
> Hence I don't think that triggereing actions after Cocoon CLI is going 
> to solve problems, but instead created more since it breaks the sitemap.

Not break, just doesn't solve the problem with the same mechanism.
Remember we only have two 'running modes': webapp and CLI.

> You say that the webapp is the primary Cocoon-Forrest method, and as you 
> know I agree. the CLI is just a way of recreating the same 
> user-experience by acting as a user that clicks on all links.
> 
> BUT the user doesn't necessarily work like this, the user can also type 
> in a URL in the address filed, even if it's not linked, but CLI won't 
> generate this.
> Why?
> Because Cocoon is not an invertible function. That means that given 
> sources and a sitemap, we *cannot* create all the possible positive 
> requests. Which in turn means that the Cocoon CLI will never be able to 
> create a fully equivalent site as the webapp.
> 
> So we should acknowledge that we need a mechanism that given some rules, 
> can reasonably create an equivalent site. Crawling is it, and it 
> generally works well, since usually sites need to be linked from a 
> homepage to be accessed. Site usage goes through navigation, ie links.
> 
> Now, Cocoon is not invertible, and this is IMHO a fact. But *parts* of 
> the sitemap *are* invertible. These parts are basically those where a 
> complete URI sub-space is mapped to a specific pipeline, and when no 
> parts of it have been matched before.
> 
> 
>     <map:match pattern="sub/URI/space/**">
>        ...
>     </map:match>
> 
> 
> This means that we can safely invert Cocoon here, and look at the 
> sources to know what the result will look like.
> 
> Conceptually, this gives me the theorical possibility of doing CLI 
> optimizations for crawling without changing the Cocoon usage patterns. 
> It's an optimizations inside the CLI, and nothing outside changes.

Yes!  Today's Mr Clever Award goes to Nicola, for working all this out
and presenting it so clearly :)

So really, the CLI could short-cut any URI served with <map:read>.

The "how to invert a sitemap" question also pops up when trying to
auto-generate a linkmap (specifically, link targets), so a general
solution (insofar as one is possible) would be very useful.

One thing I don't see: how does the CLI know that when one Javadoc file
is referenced, it must copy all of them across?  Remember, you stripped
the 'java:' scheme in step 1.

> Now, since the theory is solved, the question slides to how to do it, 
> especially because the pattern can have sitemap variable substitutions 
> in it.

So we have two options:

1) Implement a sitemap inverter, use it to create a 'lookup table' of
shortcuttable URIs, and then integrate this into the CLI.
2) Say "life's too short, let's just copy the files with Ant".

Now, practically, solution 1) is going to take a _long_ time to be
developed.  If it comes down to me, it will be developed when the linkmap
needs it.

So, given that 2) is dead simple and 90% implemented, how about going
with it for now, and replacing it with 1) when that arrives?  As long as
the public interface (link syntax) is maintained, we can switch
implementations without affecting users.


--Jeff

Mime
View raw message