forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Williams" <william...@gmail.com>
Subject Re: Locationmap Caching Help (FOR-732/711)
Date Thu, 30 Mar 2006 17:45:01 GMT
On 3/30/06, Ross Gardler <rgardler@apache.org> wrote:
> Tim Williams wrote:
> > I've been looking into FOR-711 [1]/FOR-732 [2] and I'm in need of some
> > new thoughts on the subject.  My "fix" for FOR-732 (changes to
> > locationmap having no effect) has effectively reversed much of the
> > benefit of caching the responses to begin with.  I'll explain...
>
> Sorry for not responding sooner, I've been trying to find the time to do
> so properly. I'm not finding any time at the moment, so I'll just give
> an "off the top of my head" response, in the hope that it helps in some
> small way.
>
> > The highlights of what I've done:
> > o) Created isValid() on AbstractNode.
> > o) Implemented isValid() on each node as appropriate.
> > o) LocationMapModule now tests whether caching is turned on &&
> > isValid() before returning a cached result.
> > o) If the traversed isValid() returns an invalid result back to the
> > Locationmapmodule, the cache is flushed.
> >
> > The results I'm seeing: Slow, but the correct behavior... did I
> > mention slow [3]?
>
> Hmmm... that's quite a performance hit. I don't really understand why it
> is so bad with caching turned on. It's probably a really stupid question
> but are we using the right kind of data structure for the cache data.

Because it's making ~315 validity tests per request.  Even if caching
turned on, it takes a while to test validity that many times.  We're
probably not using the perfect data structure (HashMap), but it's not
the cause of this problem I think.  We could eek out some performance
by changing the data structure, maybe an MRU style, but it's not the
huge bottleneck right now.  The traversals are quick, lookups are
quick, just doing it 315 times per request sucks.

> One other thought, are we checking all isValid() methods or just the one
>   for the file in which the cached data was found? There is no need to
> traverse the cache files *below* the one we currently use for giving a
> result.

We currently stop only when an invalid file is found.  Otherwise all
nodes are traversed.  If we had insight into what file a given hint
was located in, we could just test that directly without needing to do
the whole AbstractNode.isValid()-recursive thing.  Inside
getAttribute() we don't have insight into which locationmap returned
the value to us.

> > So what to do and what am I asking?
>  > ...  even if we were to
> > implement our own store ... it seems to me it's the sheer number
> > of accesses that's the issue rather than storage location.
>
> ...
>
> >  Having a
> > configurable timeout (so that validity tests are only done every X
> > number of minutes) would help but seems ultra-hacky.
>
> Why hacky? Isn't it standard practice for a cache to have a configurable
> TTL, then you use that it to fine tune performance, adjusting the cache
> on different requrests.

Yeah, our cache isn't that sophisticated, when I'm talking timeout,
I'm talking about the *whole* cache (HashMap).  I think it's standard
practice to evict individual items in the store based on TTL but not
potentially invalidate the whole thing.

> > So I'm asking
> > for new thoughts, ideas, suggestions for alternative approaches.
>
> I'm moving nearer and nearer to needing two configurations for my
> Forrest projects. One for "in development" and one for "in production".
>
> The in development version has profiling enabled, and would have caching
> enabled. The in product would have both turned off (although in a
> dynamic site we would need to leave caching on).

I think my tests have demonstrated that you'll actually get some
significant benefits for leaving caching on for static builds as well.
 Fresh site static build was ~60sec without caching and ~30sec with
it.

Thanks for the response...
--tim

Mime
View raw message