forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ross Gardler <rgard...@apache.org>
Subject Re: xml output plugin and filename extension .xml
Date Tue, 17 Jan 2006 16:23:06 GMT
Thorsten Scherler wrote:
> El mar, 17-01-2006 a las 23:49 +1100, David Crossley escribió:
> 
>>David Crossley wrote:
>>
>>>Ross Gardler wrote:
>>>
>>>>Is anyone familiar with configuration of the Cocoon crawler? We need to 
>>>>modify it so that it will follow links defined in whatever format the 
>>>>output document creates rather than just HTML format documents.
>>>
>>>In our main/webapp/WEB-INF/cli.xconf
>>>
>>>    |    confirm-extensions: check the mime type for the generated page
>>>    |                        and adjust filename and links extensions
>>>    |                        to match the mime type
>>>    |                        (e.g. text/html->.html)
>>>
>>>at the moment it is set to false.
>>>
>>>I have never understood how to use it.
>>>
>>>Are you suggesting that we might be able to get rid of
>>>the need for responding on filename extensions.
>>>
>>>http://cocoon.apache.org/2.1/userdocs/offline/
>>>http://wiki.apache.org/cocoon/CommandLine
>>>
>>>I notice from those docs that the default is
>>>confirm-extensions=true (opposite to us).
>>
>>I tried this today ...
>>
>>Edit main/webapp/WEB-INF/cli.xconf and
>>set "confirm-extensions=true".
>>
>>Do 'forrest site' ...
>>
>>* [1/0]     [0/0]     5.633s 10.5Kb  linkmap.html
>>Total time: 0 minutes 7 seconds,  Site size: 10,782 Site pages: 1
>>
>>So it processed the first page but did not gather any links
>>from the page (the third column numbers are empty).
>>
>>Unfortunately we cannot see any logs in 'forrest site' mode
>>due to issue:
>>
> 
> 
> Just a shot in the dark, we have/had a similar problem in v2. The
> crawler expect certain markup such as <a href=""/> AFAIR. 

According to the CLI docs (if I remember correctly) the crawler should 
follow links in @href, @src, etc. regardless of the parent element.

Not sure how this relates to your findings with v2.

> so I reckon you should try to add <a href="/"/> to you doc (if not aready) which
IMO should work. 

That would be a quick test. Try a few link types and destinatons:

<link href="index.html">...</link>
<link href="index.xml">...</link>
<link src="index.html">...</link>

etc.

Ross



Mime
View raw message