forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ross Gardler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (FOR-677) leading slash in gathered URIs causes double the number of links to be processed
Date Tue, 20 Sep 2005 23:08:27 GMT
    [ http://issues.apache.org/jira/browse/FOR-677?page=comments#action_12330051 ] 

Ross Gardler commented on FOR-677:
----------------------------------

We cannot always trim the slash. This is because, as you know, the / takes us back to root.
The only other way of doing that would be to do "../../../" type patterns, but that is bad
since we can't move the documents then. Cases where this might be needed is a document included
from an external repository that wants to link back to a known page, for example, the index
page with "/index.html".

We need to have Forrest "relativise" and "absolutise" the links, or make the linkmap intelligent
enough to relaise that "root/index.html" is the same as "/root/index.html"

Not sure how to do this as I've never looked at the linkmap code.

> leading slash in gathered URIs causes double the number of links to be processed
> --------------------------------------------------------------------------------
>
>          Key: FOR-677
>          URL: http://issues.apache.org/jira/browse/FOR-677
>      Project: Forrest
>         Type: Bug
>   Components: Core operations
>     Versions: 0.8-dev, 0.7
>     Reporter: David Crossley
>      Fix For: 0.8-dev

>
> Doing 'forrest' starts at the virtual document called linkmap.html where the Cocoon crawler
gathers the initial set of links, then starts crawling and generating pages. Any new links
are pushed onto the linkmap. However, for some sites, such as our own "seed-sample" and our
"site-author", there is a sudden jump in the number of URIs remaining to be processed.
> This is due to a URI with a leading slash (e.g. /samples/faq.html). When that URI is
processed, it gains a whole new set of links all with leading slashes, and so the list of
URIs is potentially doubled.
> This issue could be due to a user error, i.e. adding a link that deliberately begins
with a slash. Sometimes, that is unavoidable.
> However, we do have a sitemap transformer to "relativize" and "absolutize" the links.
Should it always trim the leading slash? Or are there cases where that should not happen,
so cannot generalise?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message