forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Upayavira" ...@upaya.co.uk>
Subject Re: CLI Reporting
Date Sat, 09 Aug 2003 18:43:32 GMT
Jeff,

> > > X [0] site:changes      BROKEN: No pipeline matched request:
> > > site:changes from page sitemap-ref.xml
> > 
> > It'd take some thinking, but it should be doable. Partly because
> > that link might be used in more than one place, so you'd need to
> > report a broken link and all its linking pages, which is kinda the
> > other way around from what I've been planning.
> 
> Each broken link could be reported as it is encountered:

Yes, but that means remembering the 'parent' whenever you add a link to be crawled. 
And as you find more links to uncrawled pages, you can add another parent to the 
page. You only find out that a page is broken when you try spidering it, and you might 
have seen ten links to it already. Do you show just one, or all ten?

> X [0] site:changes      BROKEN: No pipeline matched request:
> site:changes from page index.xml .... X [0] site:changes      BROKEN:
> No pipeline matched request: site:changes from page sitemap-ref.xml
> ....
> 
> Meaning only one link, to the page's parent, need be recorded when the
> link sampler encounters it.

But the broken link may be found when it is crawled, not when the link sampler sees 
it.
 
> > > Or even better, 
> > > 
> > > X [0] site:changes      BROKEN: No pipeline matched request:
> > > site:changes from page sitemap-ref.xml line 102
> > 
> > Given the way that links are gathered, it won't be possible to
> > calculate line numbers (i.e. in a SAX pipeline).
> 
> Well there's the org.xml.sax.Locator object, but I don't know if
> Cocoon does much with it.

I don't think so, as I remember others talking about that problem.
 
> > > > 2) How should I report this information. There's three
> > > > possibilities: 	* to the screen (results in a lot of info
> > > > scrolling by) 	* to an XML file (extending the broken links xml
> > > > file idea) 	* to the standard Cocoon log files (don't support
> > > > structured data)
> > > 
> > > Perhaps real-time text, as currently done, with full XML logged at
> > > the same time?  Then one day we could have a web interface for
> > > Forrest with a "render this site to disk" button.   Once the CLI
> > > is done, we could transform the output to HTML.
> > 
> > Okay. So we have minimal output to stdout, and XML generated to log
> > what's been going on. And I'll use SAX for creating that XML rather
> > than DOM so that it'll be ready for a decent cocoon based web
> > interface (such as Unico's publishingService).
> > 
> > Thanks for this. I'll see what I can get going, and then put what
> > I've got into the Cocoon scratchpad. I hope you'll be willing to
> > give it a go.
> 
> Certainly will.  Thanks!

Great. I'll get on and code something tomorrow, and will just come up with something 
that you can comment upon.

Regards, Upayavira


Mime
View raw message