Return-Path: Delivered-To: apmail-jakarta-commons-dev-archive@apache.org Received: (qmail 53303 invoked from network); 4 Feb 2002 06:40:12 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 4 Feb 2002 06:40:12 -0000 Received: (qmail 15874 invoked by uid 97); 4 Feb 2002 06:40:19 -0000 Delivered-To: qmlist-jakarta-archive-commons-dev@jakarta.apache.org Received: (qmail 15856 invoked by uid 97); 4 Feb 2002 06:40:19 -0000 Mailing-List: contact commons-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Jakarta Commons Developers List" Reply-To: "Jakarta Commons Developers List" Delivered-To: mailing list commons-dev@jakarta.apache.org Received: (qmail 15839 invoked from network); 4 Feb 2002 06:40:18 -0000 Reply-To: From: "Aaron Smuts" To: "'Jakarta Commons Developers List'" Subject: RE: [collections][PATCH] SequencedHashMap violates Map contract, has poor performance -- BufferCache, SequencedHastable, MRUMap and LRUMap . Date: Mon, 4 Feb 2002 01:44:11 -0500 Organization: Smuts Message-ID: <004401c1ad47$5afabe70$0100a8c0@realpulse.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2627 Importance: Normal In-Reply-To: X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N > -----Original Message----- > From: Michael Smith [mailto:michael@iammichael.org] > Sent: Monday, February 04, 2002 12:22 AM > To: Jakarta Commons Developers List > Subject: RE: [collections][PATCH] SequencedHashMap violates Map contract, > has poor performance -- BufferCache, SequencedHastable, MRUMap and LRUMap > . > > Aaron Smuts [mailto:aaron.smuts3@verizon.net] wrote: > > I'm looking for new memory management algorithms for JCS. The LRU I'm > > using, which is basically the LRUStore I added, is good, but I'd like > > something more efficient for huge collections of objects. Basically, > > every get results in a reshuffling of some retrieved item to > > the front, > > unless the item is already the front runner. Since the likelihood of > > the retrieved item already being the first it small, I'd like to find > > something even more efficient. > > But that "move to front" operation is essentually six lines of code in > addition to the hash table lookup. Might have trouble getting more > efficient than that. > Ya. > > The cache keeps pushing items to the front. When new items > > arrive they > > are pushed off to disk in a group of say 5. The memory cache > > plugs into > > the cache hub. The hub has a bunch of auxiliary caches plugged in and > > manages expiration and the transmission of items to the auxiliaries > > based on the element and regional attributes of the cache. If one of > > the auxiliaries is of the disk type, the overflow from the > > memory cache > > (the LRU) is put to disk. These puts are queued in a very efficient > > manner, first going to a purgatory and then to disk (take a look). > > sounds like the real problem is figuring out which items are "safe" to > move to disk, Yes. I want a LeastFrequentlyUsed and not a LRU store. and keeping track of those items once the've been written. > I have a few stable, efficient solutions. The indexed disk cache works very well and is far faster than any b tree. > > I'd like to avoid the amount of shuffling of memory elements somehow. > > Perhaps by keeping an integer tally of the access and having a > > background thread handling the overflow, so the retrievals > > are almost at > > map time. An algorithm based on the current access number > > and the total > > accesses vs the total number of items could be used. This > > may result in > > a few less operations in the synchronous get operation. > > when you say "map" time, I assume you mean "hash" map time Ya. -- that is, > O(1). An interesting idea. I'm not sure how much it would buy you > though if you have to amortize the cost out. Probably depends greatly > on the frequency of scans looking for items to offload, and the > frequency of retrievals. I probably wouldn't use a background thread > though... just perform the scan every n retrievals or every 2n > retrieveals. That probably helps things be a little more deterministic > and help amortize the cost of the operations. > Ya. Except that items will hang around till they are used or spooled. When we consider expiration, you might want to have some cleanup on a stable sized cache, to proactively get some memory back. Right now the cache hub handles this on get. There is some motivation for a background process here . . . > > Since this isn't a bottleneck, I'm not too concerned with it, > > but it is > > an interesting problem that someone might want to undertake. > > I seem to remember seeing someone implementing a scalable hashtable of > sorts that offloaded infrequently used items to disk as part of a PhD > project or something like that. It was a few years ago though, and I > can't find a reference to it now. Too bad. Hmmn. It certainly is an > interesting problem, and one I'd probably enjoy working on. I'm just > not sure when I'll have the time to work on such a thing. > I'd like to implement an LFU, rather than LRU, cache but the algorithms can get messy. If you know of any clean systems point me that way. I'll look around. Something simple could be done where the number of total accesses at insertion time combined with the number of element accesses vs the total number of cache accesses could be evaluated as a means for background removal. This might be easier to implement. The least accessed item could be determined, but you have to deal with constantly incoming items and changing totals. ( (current_time access total - Insertion_time total)/ total number of items ) / item access total = score A background thread could compare scores over time to see if it was getting worse. Anything below a certain score or/and with 2 drops could be removed. For gets you only have to update the access number. All you'd need would be the wrapped objects in a hash. You'd just have to deal with the messy business of iteration through the hashmap without hurting performance. Coming up with a decent scoring equation is tricky in cases where all the gets are evenly distributed. Or you could just record the total accesses. The element with the smallest possible given the total number of items gets removed, if you can keep that stable enough. But this is just LRU . . . LRU is cheap, easy and effective, but not as sophisticated as you might want. LFU is a different animal. I'm just thinking out loud. Aaron > regards, > michael > > > > -- > To unsubscribe, e-mail: unsubscribe@jakarta.apache.org> > For additional commands, e-mail: help@jakarta.apache.org> -- To unsubscribe, e-mail: For additional commands, e-mail: