commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Smith" <mich...@iammichael.org>
Subject RE: [collections][PATCH] SequencedHashMap violates Map contract, has poor performance -- BufferCache, SequencedHastable, MRUMap and LRUMap .
Date Mon, 04 Feb 2002 05:21:41 GMT
Aaron Smuts [mailto:aaron.smuts3@verizon.net] wrote:
> I'm looking for new memory management algorithms for JCS.  The LRU I'm
> using, which is basically the LRUStore I added, is good, but I'd like
> something more efficient for huge collections of objects.  Basically,
> every get results in a reshuffling of some retrieved item to
> the front,
> unless the item is already the front runner.  Since the likelihood of
> the retrieved item already being the first it small, I'd like to find
> something even more efficient.

But that "move to front" operation is essentually six lines of code in
addition to the hash table lookup.  Might have trouble getting more
efficient than that.

> The cache keeps pushing items to the front.  When new items
> arrive they
> are pushed off to disk in a group of say 5.  The memory cache
> plugs into
> the cache hub.  The hub has a bunch of auxiliary caches plugged in and
> manages expiration and the transmission of items to the auxiliaries
> based on the element and regional attributes of the cache.  If one of
> the auxiliaries is of the disk type, the overflow from the
> memory cache
> (the LRU) is put to disk.  These puts are queued in a very efficient
> manner, first going to a purgatory and then to disk (take a look).

sounds like the real problem is figuring out which items are "safe" to
move to disk, and keeping track of those items once the've been written.

> I'd like to avoid the amount of shuffling of memory elements somehow.
> Perhaps by keeping an integer tally of the access and having a
> background thread handling the overflow, so the retrievals
> are almost at
> map time.  An algorithm based on the current access number
> and the total
> accesses vs the total number of items could be used.  This
> may result in
> a few less operations in the synchronous get operation.

when you say "map" time, I assume you mean "hash" map time -- that is,
O(1).  An interesting idea.  I'm not sure how much it would buy you
though if you have to amortize the cost out.  Probably depends greatly
on the frequency of scans looking for items to offload, and the
frequency of retrievals.  I probably wouldn't use a background thread
though...  just perform the scan every n retrievals or every 2n
retrieveals.  That probably helps things be a little more deterministic
and help amortize the cost of the operations.

> Since this isn't a bottleneck, I'm not too concerned with it,
> but it is
> an interesting problem that someone might want to undertake.

I seem to remember seeing someone implementing a scalable hashtable of
sorts that offloaded infrequently used items to disk as part of a PhD
project or something like that.  It was a few years ago though, and I
can't find a reference to it now.  Too bad.  It certainly is an
interesting problem, and one I'd probably enjoy working on.  I'm just
not sure when I'll have the time to work on such a thing.

regards,
michael



--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message