cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Turner <je...@apache.org>
Subject Re: 2.1: Neither LinkSerializer nor LinkGatherer producing a complete link list
Date Sat, 30 Aug 2003 09:55:15 GMT
On Sat, Aug 30, 2003 at 09:27:17AM +0200, Florian G. Haas wrote:
> Hello,
> 
> since this is my first post to this list, on which I've been lurking for four 
> months, permit me to introduce myself: My name is Florian, I have done some 
> work on the TM4J project (http://www.tm4j.org), and I'm currently working on 
> integrating TM4J with Cocoon for the purpose of generating web sites from XML 
> Topic Maps.

Cool :)  If you come up with something generic and useful, we'd be
very interested in including it in Forrest[1].  Kal Ahmed once brought up
the idea of using TMs in Forrest when the project was just starting:

http://marc.theaimsgroup.com/?t=101647336300007&r=1&w=2

Forrest is just now getting to the stage of needing an internal
metadata model (TMs or RDF).

> Everything I've done so far works nicely in a servlet environment. What I'm 
> trying to do now is use the Cocoon CLI to generate the web site structure 
> offline.
> 
> Now, I'm confronted with the LinkGatherer throwing numerous NPEs in the 
> process, which is why offline generation stops processing immediately after 
> the initial target URL. Effectively, it does not do any link crawling at all. 
> This appears to be a known problem as it's been mentioned in an earlier post 
> on the dev list[1], but seems not to have been fixed. There was also some 
> debate in a thread started on 7-30 on this list[2], but it didn't come to any 
> conclusion.
> 
> Trying to track this problem down a little further, I have tried using the 
> LinkGatherer in the sitemap and access the link list from the servlet 
> environment:
> 
> <!-- ... -->
>     <map:transformer name="links"
>                      src="org.apache.cocoon.sitemap.LinkGatherer"/>
> <!-- ... -->
>     <map:pipeline id="links">
>       <map:match pattern="links">
> 	<map:generate type="file" src="src/fgh.xtm"/>
> 	<map:transform type="xslt" src="xsl/tm.xsl">
> 	  <!-- lots of parameters here -->
> 	</map:transform>
> 	<map:transform type="links"/>

I don't think you're meant to be explicitly adding the LinkGatherer.
It will be added automatically to the pipeline, just before the
serializer.  And then, only when you are using the 'new' CLI
implementation (passing --xconf=cli.xconf to org.apache.cocoon.Main).

> 	<map:serialize type="xml"/>
>       </map:match>
>     </map:pipeline>
> <!-- ... -->
....
> This appears to be the same problem that occurs when using the CLI. I also 
> tried a different approach, using LinkSerializer:
> 
>     <map:pipeline id="links">
>       <map:match pattern="links">
> 	<map:generate type="file" src="src/fgh.xtm"/>
> 	<map:transform type="xslt" src="xsl/tm.xsl">
> 	  <!-- lots of parameters here -->
> 	</map:transform>
> 	<map:serialize type="links"/>
>       </map:match>
>     </map:pipeline>

FYI, LinkSerializer is usually used in conjunction with a 'links'
view.  Eg. from Forrest:

<map:views>
  <map:view name="links" from-position="last">
    <map:transform src="resources/stylesheets/filterlinks.xsl">
      <map:parameter name="ctxbasedir" value="{realpath:.}/"/>
    </map:transform>
    <map:serialize type="links"/>
  </map:view>
</map:views>

And then, this is only invoked if you are using the old CLI
implementation (*not* using cli.xconf).

Btw, Forrest's Ant script has examples of both the new (linkgatherer)
and old (link view) CLI usages in its script:

http://cvs.apache.org/viewcvs.cgi/xml-forrest/src/resources/forrest-shbat/forrest.build.xml?rev=1.82&content-type=text/vnd.viewcvs-markup

> Now it's getting a bit bizarre: The LinkSerializer correctly outputs the very 
> first link in the result document (a link to a CSS stylesheet), but ignores 
> all others (numerous <a href="..."> elements, for example).

Hmm, not sure.  Is your XML using namespaces?  But in any case, the
LinkSerializer wouldn't do much good all on its lonesome in a
pipeline; needs to be in a view.

> I guess it's worth mentioning that this behavior appears to be limited to the 
> mounted subsitemap in my ~/public_html directory, I've been unable to produce 
> this behavior for the rest of the Cocoon samples. The Tomcat process which 
> serves as Cocoon's servlet environment runs as root, so I guess we can rule 
> out file permission-related issues.
> 
> [Side note, something else which baffled me: Shouldn't <map:views> defined in 
> the root sitemap be available to all subsitemaps unless overridden? I for my 
> part can't use neither the content, nor pretty-content, nor links view unless 
> redeclaring it in my subsitemap. Serializers, transformers, and generators, 
> and apparently everything else defined in the root sitemap works in the 
> subsitemap, though.]

Works fine for me in Forrest:

http://localhost:8888/body-index.html?cocoon-view=links

What does your link view definition look like?

> All problems mentioned occur both in the 2.1 release and in a CVS checkout as 
> of yesterday (8-29).
> 
> Now, my questions:
> 1. Could someone give me a clue as to what may be causing the NPEs for the 
> LinkGatherer? 
> 2. Where can I find documentation on configuration parameters for the 
> LinkGatherer, if any?
> 3. What might be causing this strange behaviour for the LinkSerializer?
> 4. How, provided I get the links view working, can I configure the CLI to use 
> "the old way" instead of the LinkGatherer? It's been mentioned[3] that this 
> is possible, yet I haven't found out how to do so.

You might want to check Forrest out of CVS and use its
sitemap/cli.xconf/ant script as reference.  Currently we're using the
new CLI (twice as fast as the old'un), but support for the old one is
still in forrest.build.xml (commented out), if you want to experiment
with that.


--Jeff


[1] http://xml.apache.org/forrest/


> I'd greatly appreciate any hints which may point me in the right direction.
> 
> Thanks and best regards,
> 
> Florian
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Mime
View raw message