Return-Path: Delivered-To: apmail-forrest-dev-archive@www.apache.org Received: (qmail 92301 invoked from network); 3 Dec 2009 02:43:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Dec 2009 02:43:43 -0000 Received: (qmail 78463 invoked by uid 500); 3 Dec 2009 02:43:43 -0000 Delivered-To: apmail-forrest-dev-archive@forrest.apache.org Received: (qmail 78361 invoked by uid 500); 3 Dec 2009 02:43:43 -0000 Mailing-List: contact dev-help@forrest.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: dev@forrest.apache.org List-Id: Delivered-To: mailing list dev@forrest.apache.org Received: (qmail 78353 invoked by uid 99); 3 Dec 2009 02:43:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Dec 2009 02:43:42 +0000 X-ASF-Spam-Status: No, hits=-10.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Dec 2009 02:43:40 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id C2104234C052 for ; Wed, 2 Dec 2009 18:43:20 -0800 (PST) Message-ID: <1612242945.1259808200793.JavaMail.jira@brutus> Date: Thu, 3 Dec 2009 02:43:20 +0000 (UTC) From: "Tim Williams (JIRA)" To: dev@forrest.apache.org Subject: [jira] Updated: (FOR-677) leading slash in gathered URIs causes double the number of links to be processed In-Reply-To: <268295816.1127255848351.JavaMail.jira@ajax.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/FOR-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Williams updated FOR-677: ----------------------------- Fix Version/s: (was: 0.9-dev) 0.10 Moving to next release. > leading slash in gathered URIs causes double the number of links to be processed > -------------------------------------------------------------------------------- > > Key: FOR-677 > URL: https://issues.apache.org/jira/browse/FOR-677 > Project: Forrest > Issue Type: Bug > Components: Core operations > Affects Versions: 0.7, 0.8 > Reporter: David Crossley > Fix For: 0.10 > > > Doing 'forrest' starts at the virtual document called linkmap.html where the Cocoon crawler gathers the initial set of links, then starts crawling and generating pages. Any new links are pushed onto the linkmap. However, for some sites, such as our own "seed-sample" and our "site-author", there is a sudden jump in the number of URIs remaining to be processed. > This is due to a URI with a leading slash (e.g. /samples/faq.html). When that URI is processed, it gains a whole new set of links all with leading slashes, and so the list of URIs is potentially doubled. > This issue could be due to a user error, i.e. adding a link that deliberately begins with a slash. Sometimes, that is unavoidable. > However, we do have a sitemap transformer to "relativize" and "absolutize" the links. Should it always trim the leading slash? Or are there cases where that should not happen, so cannot generalise? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.