lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Heisey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3393) Implement an optimized LFUCache
Date Fri, 07 Sep 2012 14:40:07 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450667#comment-13450667
] 

Shawn Heisey commented on SOLR-3393:
------------------------------------

bq. This was because of the call to findLowestFrequency in doEviction. If one element has
frequency=0 and all other ones have frequency=maxFreq, findLowestFrequency must inspect every
slot. But this should be avoidable: since doEviction is only called inside put, there is no
need to compute lowestFreq inside doEviction, put will always set lowestFreq=0 in the end.

If indeed I only used doEviction inside put, then lowestFreq would always be 0.  Thinking
generally, in case doEviction did get called anywhere else, perhaps that value should be tracked
rather than computed every time.  There's probably a way to quickly figure out the new value
if the data structure connected to that frequency gets reduced to empty.

If we eliminate the need to cycle through every unused frequency one by one, we should be
able to keep maxFreq at the max cache size minus one, allowing for the greatest granularity
under the current implementation.

Like Yonik, I think having decay is important, but the last implementation was way too aggressive.
 My current thought: Only do it at warm time, and only do it after a configurable time period
has passed from the previous decay or initial cache creation.  Offer an additional option
where it would happen after X commits.  Default the decay to true and the time period to one
hour.  Apply the decay by subtracting one rather than doing a bitshift.  This should keep
things fairly predictable in the short term while also keeping the cache clean over the long
term.

Unit tests for the decay would probably use values <= 5000 milliseconds for the time period.

                
> Implement an optimized LFUCache
> -------------------------------
>
>                 Key: SOLR-3393
>                 URL: https://issues.apache.org/jira/browse/SOLR-3393
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 3.6, 4.0-ALPHA
>            Reporter: Shawn Heisey
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3393.patch, SOLR-3393.patch, SOLR-3393.patch, SOLR-3393.patch,
SOLR-3393.patch, SOLR-3393.patch, SOLR-3393.patch
>
>
> SOLR-2906 gave us an inefficient LFU cache modeled on FastLRUCache/ConcurrentLRUCache.
 It could use some serious improvement.  The following project includes an Apache 2.0 licensed
O(1) implementation.  The second link is the paper (PDF warning) it was based on:
> https://github.com/chirino/hawtdb
> http://dhruvbird.com/lfu.pdf
> Using this project and paper, I will attempt to make a new O(1) cache called FastLFUCache
that is modeled on LRUCache.java.  This will (for now) leave the existing LFUCache/ConcurrentLFUCache
implementation in place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message