Return-Path: Delivered-To: apmail-forrest-dev-archive@www.apache.org Received: (qmail 4537 invoked from network); 14 Jul 2005 05:45:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 14 Jul 2005 05:45:15 -0000 Received: (qmail 74278 invoked by uid 500); 14 Jul 2005 05:45:09 -0000 Delivered-To: apmail-forrest-dev-archive@forrest.apache.org Received: (qmail 74201 invoked by uid 500); 14 Jul 2005 05:45:08 -0000 Mailing-List: contact dev-help@forrest.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: dev@forrest.apache.org List-Id: Delivered-To: mailing list dev@forrest.apache.org Received: (qmail 74089 invoked by uid 99); 14 Jul 2005 05:45:07 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jul 2005 22:45:07 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=HTML_20_30,HTML_MESSAGE,RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of rbpandey@gmail.com designates 64.233.184.199 as permitted sender) Received: from [64.233.184.199] (HELO wproxy.gmail.com) (64.233.184.199) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jul 2005 22:45:02 -0700 Received: by wproxy.gmail.com with SMTP id i31so357148wra for ; Wed, 13 Jul 2005 22:45:03 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:references; b=QeymxykBN4WEp7daI5myG8/e8fbEvp2yGj0u6kI8y+nTmlkdfDUatluRpKFISAXZkRIDVMeB3UmEOfmbuwCj3NIOBKSFkLX5lqbCB75Evp7UDeW7hAE3H5ibeN6q2oqkdqYlfcnARBB8dKFkCzrgNZtn1gvE8KuhroBjknaHTmM= Received: by 10.54.33.7 with SMTP id g7mr629986wrg; Wed, 13 Jul 2005 22:43:55 -0700 (PDT) Received: by 10.54.66.6 with HTTP; Wed, 13 Jul 2005 22:43:55 -0700 (PDT) Message-ID: <80d298bc0507132243420ef815@mail.gmail.com> Date: Thu, 14 Jul 2005 01:43:55 -0400 From: Rasik Pandey Reply-To: Rasik Pandey To: dev@forrest.apache.org Subject: Re: Add support for Googles sitemap protocol? In-Reply-To: <42D5A724.4000808@apache.org> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_9273_4643749.1121319835136" References: <80d298bc05071311181b5c35b8@mail.gmail.com> <42D584CB.30300@apache.org> <42D5960F.3050507@apache.org> <80d298bc050713161262ca8f21@mail.gmail.com> <42D5A724.4000808@apache.org> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_9273_4643749.1121319835136 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi Ross, > This is a good point. How about also also providing a generator that > would get the last modified header of remote resources. The results of > the two could be aggregated together. I think=20 http://svn.apache.org/repos/asf/cocoon/trunk/src/java/org/apache/cocoon/gen= eration/LinkStatusGenerator.javawould do the trick, although it would have to be modified to make a call to get the "last-modified" header, so hopefully we could get that added to a= =20 future release of cocoon. With a quick examination of the code, it looks=20 like it will crawl a URL and generate an xml report, allowing includes and= =20 excludes expressions. =20 > However, this still is not totally robust, becayse some remote resources > will always indicate that they have changed even when the content has > not (for example Daisy tracks changes to meta-data that Forrest does not > currently use). What strategy do you propose to handle this case if any? >> Are you familiar with=20 >>=20 http://cocoon.apache.org/2.1/userdocs/generators/linkstatus-generator.html >> , the documentation is skimpy, but it may be what we need to handle both >> static and dynamic cases. > No I'm not familiar. I wonder what the docs mean by "status". Will it=20 > provide the last modified header as suggested above? See above... > I don't have the time to experiment with it now, but I (and I am sure>=20 other devs) would love to hear about your findings. See above...=20 >> I may need some assistance to know how to build in hooks from >> skinconf.xml to the sitemap format generation. > I'm not sure what you mean by that. But there are plenty of people here > to answer your questions as they arise. I am sure there will be a need to allow users to specify a configuration fo= r=20 this like the includes/excludes on the LinkStatusGenerator crawls and maybe= =20 the value for the google sitemap format. Can you give me a=20 quick overview of how params make it from the skinconf.xml to the sitemap(s= )=20 or xsl(s)? Regards, Rus http://www.discountdracula.com ------=_Part_9273_4643749.1121319835136 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline
Hi Ross,

 
> This is a good point. How about also also providing a gene= rator that
> would get the last modified header of remote resources. The results of=
> the two could be aggregated together.

I think http://svn.apache.org/repos/as= f/cocoon/trunk/src/java/org/apache/cocoon/generation/LinkStatusGenerator.ja= va would do the trick, although it would have to be modified to make a call to get the "last-modified" header, so hopefully we could get= that added to a future release of cocoon. With a quick examination of the code, it looks like it will crawl a URL and generate an xml report, allowing includes and excludes expressions.
 
> However, this still is not totally robust, becayse some remote resourc= es
 > will always indicate that they have changed even when the conten= t has
 > not (for example Daisy tracks changes to meta-data that Forrest = does not
 > currently use).

What strategy do you propose to handle this case if any?

  >> Are you familiar with
  >> http://cocoon.apache.org/2.1/userdocs/generator= s/linkstatus-generator.html
  >> , the documentation is skimpy, but it may be what we need t= o handle both
  >> static and dynamic cases.

 > No I'm not familiar. I wonder what the docs mean by "status= ". Will it
 > provide the last modified header as suggested above?<= /div>

 See above...

> I don't have the time to experiment with it now, but I (and I am= sure
>  other devs) would love to hear about your findings.

See above...

  >> I may need some assistance to know how to build in hooks fr= om
  >> skinconf.xml to the sitemap format generation.
 
 > I'm not sure what you mean by that. But the= re are plenty of people here
 > to answer your questions as they arise.

I am sure there will be a need to allow users to specify a configuration for this like the includes/excludes on the LinkStatusGenerator crawls and maybe the <changefreq> value for the google sitemap format.  Can you give me a quick overview of how params make it from the skinconf.xml to the sitemap(s) or xsl(s)?


Regards,
Rus
http://www.discountdracula.com<= /a>




------=_Part_9273_4643749.1121319835136--