cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: [ANN] Thanks for it...
Date Sat, 22 May 2004 19:00:20 GMT
Pier Fumagalli wrote:

> On 22 May 2004, at 09:13, Ugo Cei wrote:
> 
>> Il giorno 21/mag/04, alle 18:11, Pier Fumagalli ha scritto:
>>
>>> From this morning at 8:32 AM (BST) http://www.vnunet.com/ is running 
>>> off a standard 2.1.5 (head) distribution of Cocoon, Apache 2.0.49 w/ 
>>> mod_cache, Jetty 4.2.19, and a hint of my take on the Cocoon kernel 
>>> empowering the backend XML data repository...
>>
>>
>> Congratulations! Can you give us some more details? How many pages are 
>> you serving daily and on which hardware, for instance? I think success 
>> stories like yours are important to demonstrate that Cocoon is able to 
>> serve lots of content with good performance.
> 
> 
> Well, let's say that Cocoon is most definitely NOT the "performant" 
> component on the site...
> 
> The pages are generated going through something like 2 megs of 
> aggregated XML documents, and given the structure of the site (and the 
> fact that we're still not 100% confident) we're using non-caching 
> pipelines...
> 
> In other words, it takes us roughly between 1 and 2 seconds to generate 
> one single HTML page (whoha, bessie)...
> 
> But it's all cached on the front end by Apache's mod_disk_cache, so, in 
> terms of performance, we don't seem to hit major problems.
> 
> And seriously, we don't care much "how long" it takes to create a 
> page... We're a news site, so the variation on URLs requested in a day 
> is not much (currently my cache is filled up with something like 2000 
> documents, even if you can have access to almost 100k articles on the 
> site).
> 
> And the architecture (with caching up front powered directly by Apache) 
> allows us to withstand "slashdot-like" attacks very easily (the first 
> one coming in generates the request, all the remaining freaks get the 
> copy cached off on the disk)...
> 
> It was a weird change from JSPs because those were never cached, and we 
> had to put a lot of effort in actually making the JSP engine and code 
> "fast"... With Cocoon, well, we know we wouldn't have been able to, so 
> we thought out other ways to deal with it, and (more importantly) it 
> forced us to think to a better and more scalable architecture...
> 
> One example above all: advertisement tags... Before, a lot of the 
> advertisement code was generated on the server on a PER REQUEST basis... 
> Now, we can't do this anymore because of the load that that would put on 
> our server, so, we had to re-engineer how to serve ads, relying (for 
> instance) more on the client javascript engine... But the knowledge that 
> _we_can_not_ pass through every single request to Cocoon, helped us in 
> the sense that id made us aware of all those problems that (for 
> instance) forbid us to deploy the same application on several different 
> machines at the same time (so, no fault tolerance, no load balancing, no 
> nothing)...
> 
> Now, the AMAZING thing, was the SPEED at which the site was developed... 
> Three weeks for the whole shabang...
> 
> Do that with JSP, yeah, right! :-)
> 
> The severe and "restrictive" contracts that cocoon imposes to the users 
> of its services might seem harsh at first (the, how do I do this, Cocoon 
> doesn't do that syndrome was felt quite strongly at the beginning), but 
> on the other hand, it forced us to _THINK_... To think about what we 
> wanted our website to do, and how one single aspecto of it related with 
> the rest of the site. Yes, we wrote some small hacks, or shortcuts, but 
> amazingly enough, after the first 1 and a half weeks spent by Jerm 
> getting all the information sorted out (with nothing moving forward and 
> my manager freaking out), the rest of the functionality came out in the 
> remaining two.... And we have a TON of pages up there...
> 
> It proved me (to my managers, and to the rest of the team) that 
> limitations in contracts, and clear defined rules and boundaries out of 
> which you cannot go to, even if they MIGHT seem counterproductive at 
> first are clearly an advantage in developing and managing complex 
> project...
> 
> In terms of what you ask about performance and so on, I still don't have 
> many figures but what I mentioned above... I know for sure that there's 
> a HELL-OF-A-LOT that we can (and we will) improve, for now, we decided 
> that no matter what, we had enough hardware to throw at the baby to 
> match any possible requirement..
> 
> We started off thinking about 4 machines (HP/380s running Linux w/ 2 
> Gigs o' ram and 2 3.2 gigs procs each, in other words, big stuff)... We 
> already scaled down on only two of those (and we kept two not for 
> performance, but for failover)...
> 
> In the future I think that we're going to use all four of them (once all 
> the sites we host will be moved to Cocoon), but maybe separate out the 
> hardware on classes of functionality (two for serving/caching, two for 
> generating content), but we'll see how the baby adapts and how it 
> behaves over the next few weeks...
> 
> For now I'm happy that it works, it works better than expected, and that 
> the concepts behind the machinery are stronger than any possible 
> performance hack you can possibly think of: if you need speed, even if 
> Cocoon is not _THAT_ fast, you can get it to serve the heck out of it 
> anyhow. You only need to  _THINK_ about your problems and not rely on 
> some magic software to magically run your badly-designed web-application 
> fast enough! :-P

Well, tell you want: I write my thesis on this and I did some heuristic 
analysis but seeing it working for real on one of the big sites out 
there... gee, I'm happy man, very happy.

:-D

-- 
Stefano, who today had a (very friendly) discussion with TimBL face to 
face in front of the web-dev crowd about the future of URIs... boy, 
weird day today.


Mime
View raw message