esme-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Kohler <markus.koh...@gmail.com>
Subject Re: Performance update: Message size in memory
Date Wed, 09 Dec 2009 08:29:37 GMT
Hi Vassil,

See comments below...




On Wed, Dec 9, 2009 at 8:46 AM, Vassil Dichev <vdichev@apache.org> wrote:

> Markus,
>
> First of all, a note about the KB/message statistics: this is only
> valid as long as you get messages from the cache! Currently the cache
> size is set to 10,000, so you will see a drop in memory usage for

message numbers, which exceed this size.


Yes I expected this. But I think you agree that for high performance with
lot's of users we need to take care that we can cache as much as possible.


> Processing messages would
> also necessarily become slower.
>
> The simplest strategies for the stemmer would be:
> 1. Move the stemmer to the companion object
> 2. Create a new stemmer every time it's needed
>
> By doing a naive test with 100,000 invocations of stem for the same
> stemmer and creating 100,000 stemmer objects it seems that
> instantiation takes almost double time. So I'm not sure contentioun
> would be much of an issue, besides the only time a stemmer is needed
> is for search and the word frequency cloud. These are not specific to
> a particular message, so can be (and should be) moved to the the
> companion object, too.

Yes that makes a lot of sense. Is the stemming currently done within the
thread that updates the UI?
Stemming could be batched then (update the word frequency only all n
messages).
I would rather like to avoid creating a new stemmer each time.


> Furthermore, search is done in a compass
> transaction anyway.
>
> I've also seen that Lucene has some potential issues with Finalizers, e.g.
they use large finalizable objects (IndexWriter IIRC). Is the index updated
for each message? I think it would also make sense to batch those updated if
possible.


> We could also have some type of pooling, but I'm not sure how
> efficient it would be. This definitely needs some benchmarks before we
> try to optimize too much.
>
> What do you think?
>

Yes. It's impossible to make decisions about which tradeoffs to make, as
long as we don't have an ESME instance with enough activce users running
(with detailed enough performance monitoring enabled).

I would therefore go for now withe the easiest possible implementation,
KISS!

Regards,
Markus

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message