Return-Path: Delivered-To: apmail-xml-cocoon-dev-archive@xml.apache.org Received: (qmail 23217 invoked by uid 500); 21 May 2002 13:33:59 -0000 Mailing-List: contact cocoon-dev-help@xml.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: Reply-To: cocoon-dev@xml.apache.org Delivered-To: mailing list cocoon-dev@xml.apache.org Received: (qmail 23206 invoked from network); 21 May 2002 13:33:59 -0000 Reply-To: From: "Berin Loritsch" To: Subject: RE: [RT]: Calculating the cache key Date: Tue, 21 May 2002 09:33:49 -0400 Message-ID: <001701c200cc$2a160610$ac00a8c0@Gabriel> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2627 In-Reply-To: X-Mimeole: Produced By Microsoft MimeOLE V6.00.2600.0000 Importance: Normal X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N > From: Carsten Ziegeler [mailto:cziegeler@s-und-n.de] > > Berin Loritsch wrote: > > > > 'h'*31^(39-1) + 't'*31^(39-2) .... + 'l' > > > > I'll let you do the math yourself. > > Yes, but what if my strings are, let's say 1000 characters > long? And looking at the code for example in the xslt > transformer or some other places this is a realistic value. My point exactly. The longer the string, the longer it takes to compute the hash. The longer it takes to compare the strings exactly (string.equals()). You get the picture. Now, out of that 1000 characters, how many actual dirivations do you get? Extending the length of the key merely to ensure unique hash values is a hack. Eventually you will get collisions. The real nead is for a _quick_ validation, and a unique key. > The current caching algorithm uses this compound key with the > exception that the longs are used to build a string: > > "FileGenerator:fe01b342;XSLTTransformer:fb021431;XSLTTransform > er:fb02542e; > HTMLSerializer:1" Yes, but all that information can be expressed in the integers themselves. The strings are useful for debugging purposes, but the longer they get the more they hurt performance, and the more they invalidate the value of caching. The Integer to String conversion is also expensive. We are doing things in strings that would be better done with numbers. > > This string can than simply be used for comparissions etc. It is quicker to match 4 ints than 1000 chars. > > The string approach was used as the algorithm does not know > how many components the key will have beforehand. So the > array for holding all > the components has to be build dynamically, which decreases > performance a little bit. Hense the compound key approach. A new entry in the compound key for each component. The component can keep track of what the actual entries for it mean. The compound key does not have to be limited to only 4 entries. Just like a string but faster. > > Of course, I see your point, that comparing a (hashed) long > value is much faster than comparing strings. > > But to build the hash value a lot of string operations are > used anyway and the hashing takes time. And all this has to > be done to compare two > keys. > So is it really still faster than simply concatenating > strings and comparing them? > > I really don't know - perhaps we could write a performance > test case and see what's faster. By using the compound key (a small list of longs or ints), the Components themselves have a unique number--combined with the mapping of the values themselves to the integer. Let's say we have the FileGenerator (most likely to have the highest number of unique resources). It is quicker to match "hello.xml" than it is to match "FileGenerator:fb0234ad" It might even be quicker to store the filenames in a list, and use the index of the entry as part of the unique key. But testing will bear this out. --------------------------------------------------------------------- To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org For additional commands, email: cocoon-dev-help@xml.apache.org