Return-Path: Mailing-List: contact xerces-c-dev-help@xml.apache.org; run by ezmlm Delivered-To: mailing list xerces-c-dev@xml.apache.org Received: (qmail 23181 invoked from network); 1 Feb 2001 07:27:21 -0000 Received: from chmls20.mediaone.net (24.147.1.156) by h31.sny.collab.net with SMTP; 1 Feb 2001 07:27:21 -0000 Received: from jonsmirl (jonsmirl.ne.mediaone.net [24.218.210.21]) by chmls20.mediaone.net (8.11.1/8.11.1) with SMTP id f117RT622666 for ; Thu, 1 Feb 2001 02:27:29 -0500 (EST) Message-ID: <019801c08c20$7713c3c0$0215a8c0@ne.mediaone.net> From: "Jon Smirl" To: References: <00ea01c08bc0$b0aa33e0$9f170609@cupertino.ibm.com> <010501c08c06$68adffe0$0215a8c0@ne.mediaone.net> <006a01c08c0d$4aad2aa0$647aa8c0@charmedquark.com> Subject: Re: DOM Performance Date: Thu, 1 Feb 2001 02:27:37 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4522.1200 x-mimeole: Produced By Microsoft MimeOLE V5.50.4522.1200 X-Spam-Rating: h31.sny.collab.net 1.6.2 0/1000/N From: "Dean Roddey" > The Java parser/DOM does this, and I personally think its more complexity > than its worth, and its worse when you really do end up touching most to all > of the document. It can create very good looking benchmarks, but I'm not > convinced its a real world win overall. And of course it would require > rewriting the entire parser system. Most real world XSL stylesheets: a) do not transcode, input and output charset are the same. b) do not look at the contents of the text nodes. They manipulate the elements and tags a lot but few do anything to the text nodes but copy them. I don't have any data to back this up but I suspect a lot of DOM programs have the same characteristics. It's not obvious to me that lazy transcoding is significantly worse even if you end up touching most of the document. The transcode on demand strategy could allow you to control the amount of memory used for transcoded buffers instead of forcing it all into memory at once. The smaller memory footprint would allow DOM manipulations of large documents without paging. In my own case I'm dynamically generating small pages (<30k) and I want more speed anywhere I can get it. It's not just the time spent in the buffer copies (three copies and two transcodes), the OS also spends a lot of time allocating and tracking the memory used for the buffers. You can't always draw parallels between Java implementations and C, especially when lots of strings are involved. Xalan's DOM changes showed us that. Jon Smirl jonsmirl@mediaone.net