forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Turner <je...@apache.org>
Subject Re: file: implemented (Re: cvs commit: ...)
Date Fri, 13 Dec 2002 06:21:40 GMT
On Thu, Dec 12, 2002 at 11:10:29AM -0800, Stefano Mazzocchi wrote:
> Jeff Turner wrote:
> >On Thu, Dec 12, 2002 at 12:13:06AM -0800, Stefano Mazzocchi wrote:
> >
> >>Jeff Turner wrote:
> >>
> >>
> >>><rant>
> >>>The CLI is evil and should have been drowned at birth.  The Cocoon CLI
> >>>can best be described as a crappy 'wget' implementation tacked onto the
> >>>side of Cocoon.  It is slow as hell, full of bugs (eg css images) and
> >>>practically unmaintained.  Rewriting wget in a corner of Cocoon was a
> >>>blindingly stupid thing to do, and I am not about to waste my time fixing
> >>>its bugs.  I would rather find a _real_ wget implementation in Java, that
> >>>can handle CSS and doesn't do screwy things with filenames, and IF
> >>>invoking Cocoon through the HTTP interface proves too slow (unlikely),
> >>>then I'd wrap Cocoon in an Avalon block and feed it URLs passed over RMI.
> >>></rant>
> >>
> >>Jeff, tell me, are you aware of how *exactly* the Cocoon CLI works?
> >
> >
> >No.  <rant> should be <uninformed rant>.
> 
> When I talk about something I don't know, I tend to ask questions first, 
> than express my opinions. But that's me.

I was not talking about something I don't know: I was _ranting_ about
something whose code I am fairly familiar with, and of which I have 4
months of painful experience.  The <rant> tags are a hint that what
follows is not a carefully reasoned critique.  Websters defines 'rant'
as:

  "To rave in violent, high-sounding, or extravagant language, without
  dignity of thought"

Please remember the context; Nicola was suggesting an implementation of a
new feature (schemes) that would tie Forrest even tighter to the CLI.  If
it helps, more context is that I was writing at 3am after a day's
fighting with Transformers :P

..
> The Cocoon CLI extensively uses the cocoon-view to do two major things:
> 
>  1) obtaining links
>  2) pushing back translated links
> 
> Cocoon CLI does link translation but it's Cocoon *ITSELF* that places 
> them in the right position and this happens *before* things gets serialized.
> 
> If you go the wget path you have to implement a link parser and 
> translator for *every* hypertext-capable binary files our serializers 
> can come up with.

Or just hack it to support cocoon-view=links when it becomes necessary.

> On the other hand, by implementing a Cocoon-aware CLI, we are gaining 
> insights from the actual semantic content of the data and we can 
> manipulate it when it's *still* semantically meaningful (thus earier to 
> process).

cocoon-view=links returns links from the decidedly unsemantic HTML, in
order to get things like skin images.

> Don't know about others, but I think it's a much more elegant (and 
> code-wise cheaper) solution than a semantically-unaware wget-like one.

Yes, of course it's more elegant.  But _practically_, it is slow and full
of bugs which no-one has volunteered to fix, and Forrest is suffering
because of this.

Now why don't I stop whining, get in there and fix it?

Because in the long run,  I would prefer to develop a separate wget-like
tool with cocoon-view hacks added to it, than to develop the CLI into a
full-blown threaded crawler.  Why?  Because a separate tool has a _much_
larger audience, so will evolve faster.  Yes, a Cocoon CLI may be more
elegant, but a separate tool can grow geometrically while the CLI grows
linearly.


--Jeff

> -- 
> Stefano Mazzocchi                               <stefano@apache.org>
> --------------------------------------------------------------------
> 
> 

Mime
View raw message