Mailing-List: contact commons-dev-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Jakarta Commons Developers List" <commons-dev@jakarta.apache.org>
Reply-To: <aaron.smuts3@verizon.net>
From: "Aaron Smuts" <aaron.smuts3@verizon.net>
To: "'Jakarta Commons Developers List'" <commons-dev@jakarta.apache.org>
Subject: RE: [collections][PATCH] SequencedHashMap violates Map contract,
  has poor performance -- BufferCache, SequencedHastable, MRUMap and LRUMap .
Date: Mon, 4 Feb 2002 01:44:11 -0500
Organization: Smuts
Message-ID: <004401c1ad47$5afabe70$0100a8c0@realpulse.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Importance: Normal
In-Reply-To: <IOENINLLJBHENFJPPGNKGECJCMAA.michael@iammichael.org>


> -----Original Message-----
> From: Michael Smith [mailto:michael@iammichael.org]
> Sent: Monday, February 04, 2002 12:22 AM
> To: Jakarta Commons Developers List
> Subject: RE: [collections][PATCH] SequencedHashMap violates Map
contract,
> has poor performance -- BufferCache, SequencedHastable, MRUMap and
LRUMap
> .
> 
> Aaron Smuts [mailto:aaron.smuts3@verizon.net] wrote:
> > I'm looking for new memory management algorithms for JCS.  The LRU
I'm
> > using, which is basically the LRUStore I added, is good, but I'd
like
> > something more efficient for huge collections of objects.
Basically,
> > every get results in a reshuffling of some retrieved item to
> > the front,
> > unless the item is already the front runner.  Since the likelihood
of
> > the retrieved item already being the first it small, I'd like to
find
> > something even more efficient.
> 
> But that "move to front" operation is essentually six lines of code in
> addition to the hash table lookup.  Might have trouble getting more
> efficient than that.
> 

Ya.

> > The cache keeps pushing items to the front.  When new items
> > arrive they
> > are pushed off to disk in a group of say 5.  The memory cache
> > plugs into
> > the cache hub.  The hub has a bunch of auxiliary caches plugged in
and
> > manages expiration and the transmission of items to the auxiliaries
> > based on the element and regional attributes of the cache.  If one
of
> > the auxiliaries is of the disk type, the overflow from the
> > memory cache
> > (the LRU) is put to disk.  These puts are queued in a very efficient
> > manner, first going to a purgatory and then to disk (take a look).
> 
> sounds like the real problem is figuring out which items are "safe" to
> move to disk, 

Yes.  I want a LeastFrequentlyUsed and not a LRU store.

and keeping track of those items once the've been written.
> 

I have a few stable, efficient solutions.  The indexed disk cache works
very well and is far faster than any b tree.

> > I'd like to avoid the amount of shuffling of memory elements
somehow.
> > Perhaps by keeping an integer tally of the access and having a
> > background thread handling the overflow, so the retrievals
> > are almost at
> > map time.  An algorithm based on the current access number
> > and the total
> > accesses vs the total number of items could be used.  This
> > may result in
> > a few less operations in the synchronous get operation.
> 
> when you say "map" time, I assume you mean "hash" map time

Ya.  

 -- that is,
> O(1).  An interesting idea.  I'm not sure how much it would buy you
> though if you have to amortize the cost out.  Probably depends greatly
> on the frequency of scans looking for items to offload, and the
> frequency of retrievals.  I probably wouldn't use a background thread
> though...  just perform the scan every n retrievals or every 2n
> retrieveals.  That probably helps things be a little more
deterministic
> and help amortize the cost of the operations.
> 

Ya.  Except that items will hang around till they are used or spooled.
When we consider expiration, you might want to have some cleanup on a
stable sized cache, to proactively get some memory back.  Right now the
cache hub handles this on get.  There is some motivation for a
background process here . . . 

> > Since this isn't a bottleneck, I'm not too concerned with it,
> > but it is
> > an interesting problem that someone might want to undertake.
> 
> I seem to remember seeing someone implementing a scalable hashtable of
> sorts that offloaded infrequently used items to disk as part of a PhD
> project or something like that.  It was a few years ago though, and I
> can't find a reference to it now.  Too bad. 

Hmmn.

 It certainly is an
> interesting problem, and one I'd probably enjoy working on.  I'm just
> not sure when I'll have the time to work on such a thing.
> 

I'd like to implement an LFU, rather than LRU, cache but the algorithms
can get messy.  If you know of any clean systems point me that way.
I'll look around.

Something simple could be done where the number of total accesses at
insertion time combined with the number of element accesses vs the total
number of cache accesses could be evaluated as a means for background
removal.  This might be easier to implement.  The least accessed item
could be determined, but you have to deal with constantly incoming items
and changing totals.

( (current_time access total  - Insertion_time total)/ total number of
items ) / item access total = score

A background thread could compare scores over time to see if it was
getting worse.  Anything below a certain score or/and with 2 drops could
be removed.  For gets you only have to update the access number.  All
you'd need would be the wrapped objects in a hash.  You'd just have to
deal with the messy business of iteration through the hashmap without
hurting performance.  Coming up with a decent scoring equation is tricky
in cases where all the gets are evenly distributed.  

Or you could just record the total accesses.  The element with the
smallest possible given the total number of items gets removed, if you
can keep that stable enough.  But this is just LRU . . .

LRU is cheap, easy and effective, but not as sophisticated as you might
want.  LFU is a different animal.

I'm just thinking out loud.

Aaron


> regards,
> michael
> 
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:commons-dev-
> unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:commons-dev-
> help@jakarta.apache.org>


--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>