esme-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Kohler <>
Subject Re: Performance update: Message size in memory
Date Wed, 09 Dec 2009 08:29:37 GMT
Hi Vassil,

See comments below...

On Wed, Dec 9, 2009 at 8:46 AM, Vassil Dichev <> wrote:

> Markus,
> First of all, a note about the KB/message statistics: this is only
> valid as long as you get messages from the cache! Currently the cache
> size is set to 10,000, so you will see a drop in memory usage for

message numbers, which exceed this size.

Yes I expected this. But I think you agree that for high performance with
lot's of users we need to take care that we can cache as much as possible.

> Processing messages would
> also necessarily become slower.
> The simplest strategies for the stemmer would be:
> 1. Move the stemmer to the companion object
> 2. Create a new stemmer every time it's needed
> By doing a naive test with 100,000 invocations of stem for the same
> stemmer and creating 100,000 stemmer objects it seems that
> instantiation takes almost double time. So I'm not sure contentioun
> would be much of an issue, besides the only time a stemmer is needed
> is for search and the word frequency cloud. These are not specific to
> a particular message, so can be (and should be) moved to the the
> companion object, too.

Yes that makes a lot of sense. Is the stemming currently done within the
thread that updates the UI?
Stemming could be batched then (update the word frequency only all n
I would rather like to avoid creating a new stemmer each time.

> Furthermore, search is done in a compass
> transaction anyway.
> I've also seen that Lucene has some potential issues with Finalizers, e.g.
they use large finalizable objects (IndexWriter IIRC). Is the index updated
for each message? I think it would also make sense to batch those updated if

> We could also have some type of pooling, but I'm not sure how
> efficient it would be. This definitely needs some benchmarks before we
> try to optimize too much.
> What do you think?

Yes. It's impossible to make decisions about which tradeoffs to make, as
long as we don't have an ESME instance with enough activce users running
(with detailed enough performance monitoring enabled).

I would therefore go for now withe the easiest possible implementation,


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message