forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <>
Subject Re: file: implemented (Re: cvs commit: ...)
Date Fri, 13 Dec 2002 07:48:49 GMT
Jeff Turner wrote:

>>>>>The CLI is evil and should have been drowned at birth.  The Cocoon CLI
>>>>>can best be described as a crappy 'wget' implementation tacked onto the
>>>>>side of Cocoon.  It is slow as hell, full of bugs (eg css images) and
>>>>>practically unmaintained.  Rewriting wget in a corner of Cocoon was a
>>>>>blindingly stupid thing to do, and I am not about to waste my time fixing
>>>>>its bugs.  I would rather find a _real_ wget implementation in Java, that
>>>>>can handle CSS and doesn't do screwy things with filenames, and IF
>>>>>invoking Cocoon through the HTTP interface proves too slow (unlikely),
>>>>>then I'd wrap Cocoon in an Avalon block and feed it URLs passed over RMI.
>>>>Jeff, tell me, are you aware of how *exactly* the Cocoon CLI works?
>>>No.  <rant> should be <uninformed rant>.
>>When I talk about something I don't know, I tend to ask questions first, 
>>than express my opinions. But that's me.
> I was not talking about something I don't know: I was _ranting_ about
> something whose code I am fairly familiar with, and of which I have 4
> months of painful experience.  The <rant> tags are a hint that what
> follows is not a carefully reasoned critique.  Websters defines 'rant'
> as:
>   "To rave in violent, high-sounding, or extravagant language, without
>   dignity of thought"
> Please remember the context; Nicola was suggesting an implementation of a
> new feature (schemes) that would tie Forrest even tighter to the CLI.  If
> it helps, more context is that I was writing at 3am after a day's
> fighting with Transformers :P

I know I shouldn't (and I'm getting year after year better on that) but 
when somebody says that I did a "blindingly stupid" thing, I tend to get 
pissed no matter what their context is :-Prrr

>>The Cocoon CLI extensively uses the cocoon-view to do two major things:
>> 1) obtaining links
>> 2) pushing back translated links
>>Cocoon CLI does link translation but it's Cocoon *ITSELF* that places 
>>them in the right position and this happens *before* things gets serialized.
>>If you go the wget path you have to implement a link parser and 
>>translator for *every* hypertext-capable binary files our serializers 
>>can come up with.
> Or just hack it to support cocoon-view=links when it becomes necessary.

FYI, the Cocoon CLI uses link views for both GET and POST. The GET part 
is to retrive the list of hyperlinks that depart from that resource, the 
POST request is to send the link of "translated links" that cocoon must 
translate right before serializing.

If you decouple the CLI from Cocoon, that POST view must be made public, 
and this can create a *major* security hole, basically allowing anybody 
to come up with a page with links translated with client-injected 
information! Which is cross-side scripting attacks for dummies!

Believe me, dude, I've thought about this so much when I designed the 
CLI that my head hurt and when I tried to discuss this on the mail list 
*nobody* cared (at that point, I think only a few people even 
*understood* what a cocoon view was supposed to be)

But nothing is carved in stone and I don't care what solution we (in 
forrest) can come up with.

>>On the other hand, by implementing a Cocoon-aware CLI, we are gaining 
>>insights from the actual semantic content of the data and we can 
>>manipulate it when it's *still* semantically meaningful (thus earier to 
> cocoon-view=links returns links from the decidedly unsemantic HTML, in
> order to get things like skin images.

??? the hyperlink semantics in HTML are only one thing that is 
semantically carved in stone on the web. Otherwise, there wouldn't be 
any google out there.

>>Don't know about others, but I think it's a much more elegant (and 
>>code-wise cheaper) solution than a semantically-unaware wget-like one.
> Yes, of course it's more elegant.  But _practically_, it is slow and full
> of bugs which no-one has volunteered to fix, and Forrest is suffering
> because of this.
> Now why don't I stop whining, get in there and fix it?
> Because in the long run,  I would prefer to develop a separate wget-like
> tool with cocoon-view hacks added to it, than to develop the CLI into a
> full-blown threaded crawler.  Why?  Because a separate tool has a _much_
> larger audience, so will evolve faster.  Yes, a Cocoon CLI may be more
> elegant, but a separate tool can grow geometrically while the CLI grows
> linearly.

Hey, know what? you'd have my full support if you took some of the CLI 
code out of Cocoon and made it part of Forrest. (not all of it, some XSP 
precompilation technology uses it) because I agree with you: the wrong 
community is currently maintaing that code.

[BTW, to give you context, I'm writing this while Jon (Stevens) saw me 
replying to this and now he's going around the house saying 'anakia 
rulez', 'dvsl is the way to go', 'you have to figure out a way to beat 
anakia's speed or you're doomed'... gotta love open source! :)]

Stefano Mazzocchi                               <>

View raw message