Return-Path: Delivered-To: apmail-cocoon-dev-archive@www.apache.org Received: (qmail 59251 invoked from network); 3 Dec 2003 03:06:58 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 3 Dec 2003 03:06:58 -0000 Received: (qmail 21115 invoked by uid 500); 3 Dec 2003 03:06:36 -0000 Delivered-To: apmail-cocoon-dev-archive@cocoon.apache.org Received: (qmail 21083 invoked by uid 500); 3 Dec 2003 03:06:36 -0000 Mailing-List: contact dev-help@cocoon.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: Reply-To: dev@cocoon.apache.org Delivered-To: mailing list dev@cocoon.apache.org Received: (qmail 21068 invoked from network); 3 Dec 2003 03:06:36 -0000 Received: from unknown (HELO www2.kc.aoindustries.com) (65.77.211.84) by daedalus.apache.org with SMTP; 3 Dec 2003 03:06:36 -0000 Received: from dialup-196.147.220.203.acc01-aubu-gou.comindico.com.au (dialup-196.147.220.203.acc01-aubu-gou.comindico.com.au [203.220.147.196]) (authenticated) by www2.kc.aoindustries.com (8.11.6/8.11.6) with ESMTP id hB347se11250 for ; Tue, 2 Dec 2003 22:07:54 -0600 Subject: Re: Problem with CVS of Cocoon-Documentation & an AuthoringQuestion From: David Crossley To: dev@cocoon.apache.org In-Reply-To: <3FCB7149.3020904@gmx.de> References: <1070284693.1236.1389.camel@ighp> <3FCB7149.3020904@gmx.de> Content-Type: text/plain Organization: Message-Id: <1070420796.1236.2678.camel@ighp> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-5) Date: 03 Dec 2003 14:06:38 +1100 Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Joerg Heinicke wrote: > David Crossley wrote: > > >>>Would it be acceptable to infrastructure@ apache? It is not just > >>>one simple DTD you are downloading, there are many included bits. > >>> > >>>I do not agree with the approach. Cocoon has encouraged people to > >>>use the entity resolver. We should not join their bad practice. > > What's exactly the problem with it? In most (?) cases the entity > resolver would jump in. Only when a parser does not understand the > concept of the resolvers, the live access would jump in. I think that it is bad practice to ever enable an xml tool to clumsily drag the DTDs across the network. Website and network efficiency has always been a big driver for me. I do not know what percentage of requests would result in this. What do other people think? (See stats below.) > > Also if people are working with Cocoon documentation, then > > they already have the DTDs with the distribution. > > > > Here is an alternative. We could go back to having the hard-coded > > ../../dtd/document-v10.dtd type of System Identifiers and set up > > the entity resolver to have a catalog at the top-level of xdocs. > > But this would again not work for the CVS - which was the reason for > this thread :-) At the point "../../dtd/document-v10.dtd" you won't find > a DTD, but a HTML file about the CVS data of this file. In my opinion we should not be driven by that use case. Why would someone try to use a web browser via ViewCVS web application to view a raw XML file, and then complain that there are bits missing? > > That is preferable to retrieving a mass of DTD stuff across > > the network every time that someone looks at a document. > > The question is - and I can't answer it - are this really masses? Hard to estimate, but here are some clues: ---------- Average xdocs/*.xml size is roughly 12 kB With document-v10 there are 2 additional downloads: document-v10.dtd = 20 kB characters.ent = 32 kB With document-v12 there are 9 additional downloads, as it is more modular (and the faq and changes DTDs add even more modules): document-v12.dtd = 8 kB document-v12.mod = 16 kB common-charents-v10.mod = 4 kB iso*.pen entity sets (6 files) = 40 kB ---------- So, if that is not a big impact on Apache infrastructure, then perhaps we should put the DTDs at somewhere.apache.org If we did that, then we should not do the hard-coded local System Identifier thing - just let those users suffer the network overhead. The Forrest project would need to participate in this discussion. They are currently managing the DTDs. There is also the issue of duplication of these between Cocoon and Forrest. It might be better that they are part of Cocoon and are made available in a way that Forrest can utilise them for its various needs (during the build by Ant, during the command-line docs build, while running as a webapp, and augmented by other projects with their own DTDs, etc.). I see the need for a Proposal, but i also feel that my case of volunteeritis is worsening. So if it is up to me then it might take some time. --David