forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Turner <je...@apache.org>
Subject Re: CLI Reporting
Date Sat, 09 Aug 2003 09:42:20 GMT
On Sat, Aug 09, 2003 at 06:33:31AM +0100, Upayavira wrote:
> Dear Forresters,
> 
> I asked this of cocoon-dev and got no response, so I'll ask here.
> 
> I'm in the process of completing a significant rewrite of the Cocoon CLI, which I 
> hope the Cocoon and Forrest communities will accept. It supports most of the 
> existing functionality, but the code is much easier to follow, debug and enhance.

Cool :) We'll ship Forrest 0.5 with pretty much whatever you come up with
that has ignore-these-links support ;)  

> One consequence of this is that I can report a lot more of what is going on. I've got

> it reporting (to stdout) for each page:
> 	* the number of links per page
> 	* the number of as yet unvisited links per page
> 	* the time taken to generate the page
> 	* the actual links found in a page
> 	* whether those links are broken 
> 	* whether those links have already been added to the crawlers link list
> 
> I'll no doubt think of more things that can be reported. So, I have two questions:
> 
> 1) Are there other things you'd like to know, to give the process greater visibility?

IMO the current minimal output is fine.  If you'd like to report more
('time taken' would be useful), that's also fine.

What I'd *love* to see is better error messages when something breaks.
Specifically, when there is a broken link, I'd like to know which page
the link was in.  Currently there is no way to tell.  One just gets
errors like:

X [0] site:changes      BROKEN: No pipeline matched request: site:changes

Ideally one would get:

X [0] site:changes      BROKEN: No pipeline matched request: site:changes from page sitemap-ref.xml

Or even better, 

X [0] site:changes      BROKEN: No pipeline matched request: site:changes from page sitemap-ref.xml
line 102

> 2) How should I report this information. There's three possibilities:
> 	* to the screen (results in a lot of info scrolling by)
> 	* to an XML file (extending the broken links xml file idea)
> 	* to the standard Cocoon log files (don't support structured data)

Perhaps real-time text, as currently done, with full XML logged at the
same time?  Then one day we could have a web interface for Forrest with a
"render this site to disk" button.   Once the CLI is done, we could
transform the output to HTML.


--Jeff

> 
> Any thoughts?
> 
> Regards, Upayavira

Mime
View raw message