forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Upayavira" ...@upaya.co.uk>
Subject Re: CLI Reporting
Date Sat, 09 Aug 2003 12:37:28 GMT
On 9 Aug 2003 at 19:42, Jeff Turner wrote:

> > I'm in the process of completing a significant rewrite of the Cocoon
> > CLI, which I hope the Cocoon and Forrest communities will accept. It
> > supports most of the existing functionality, but the code is much
> > easier to follow, debug and enhance.
> 
> Cool :) We'll ship Forrest 0.5 with pretty much whatever you come up
> with that has ignore-these-links support ;)  

Keep saying that and I'll get to it! I haven't yet reworked the xconf format. I'm sure I'll

add excludes easily enough when I get around to that.

> > One consequence of this is that I can report a lot more of what is
> > going on. I've got it reporting (to stdout) for each page: 	* the
> > number of links per page 	* the number of as yet unvisited links per
> > page 	* the time taken to generate the page 	* the actual links
> > found in a page 	* whether those links are broken 	* whether those
> > links have already been added to the crawlers link list
> > 
> > I'll no doubt think of more things that can be reported. So, I have
> > two questions:
> > 
> > 1) Are there other things you'd like to know, to give the process
> > greater visibility?
> 
> IMO the current minimal output is fine.  If you'd like to report more
> ('time taken' would be useful), that's also fine.

Okay, so I'll add time taken to the screen output.

> What I'd *love* to see is better error messages when something breaks.
> Specifically, when there is a broken link, I'd like to know which page
> the link was in.  Currently there is no way to tell.  One just gets
> errors like:
> 
> X [0] site:changes      BROKEN: No pipeline matched request:
> site:changes
> 
> Ideally one would get:
> 
> X [0] site:changes      BROKEN: No pipeline matched request:
> site:changes from page sitemap-ref.xml

It'd take some thinking, but it should be doable. Partly because that link might be 
used in more than one place, so you'd need to report a broken link and all its linking 
pages, which is kinda the other way around from what I've been planning.
 
> Or even better, 
> 
> X [0] site:changes      BROKEN: No pipeline matched request:
> site:changes from page sitemap-ref.xml line 102

Given the way that links are gathered, it won't be possible to calculate line numbers 
(i.e. in a SAX pipeline).
 
> > 2) How should I report this information. There's three
> > possibilities: 	* to the screen (results in a lot of info scrolling
> > by) 	* to an XML file (extending the broken links xml file idea) 	*
> > to the standard Cocoon log files (don't support structured data)
> 
> Perhaps real-time text, as currently done, with full XML logged at the
> same time?  Then one day we could have a web interface for Forrest
> with a "render this site to disk" button.   Once the CLI is done, we
> could transform the output to HTML.

Okay. So we have minimal output to stdout, and XML generated to log what's been 
going on. And I'll use SAX for creating that XML rather than DOM so that it'll be ready 
for a decent cocoon based web interface (such as Unico's publishingService).

Thanks for this. I'll see what I can get going, and then put what I've got into the 
Cocoon scratchpad. I hope you'll be willing to give it a go.

Regards, Upayavira


Mime
View raw message