esme-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Kohler <>
Subject Re: Performance update: Message size in memory
Date Tue, 08 Dec 2009 12:24:14 GMT
Hi Vassil,
See my comments below ...

On Tue, Dec 8, 2009 at 12:29 PM, Vassil Dichev <> wrote:

> Hey Markus, I love your performance tests- they always uncover
> something interesting.
> That's on purpose ;-)

> It's not a matter of configuration, but of coding practices. The
> stemmer in Message is part of the class, and not of the companion
> object (think "static" in Java). I will move it there and there should
> be a notable improvement.
Yes I somehow expected that. Also making it really static might lead to
contention issues.
If we  only have one stemmer instance than we might run into issues with
multiple threads accessing it and blocking each other.
You could either:
1. use a pool of stemmer objects; some classes in the concurrent package
should make this relatively easy
2. put the stemmer in a Thread local variable. The idea behind this is that
then you get automatically as many Stemmers as you have threads and that no
synchronization is needed. I seem to remember that accessing thread local is
not that cheap, so not sure whether that's the best idea

> As for the scala.xml.Elem, I'm not sure what is the alternative. I
> don't think that if we parse the XML every time and convert it to
> String, it's better. Especially the text method is particularly
> inefficient, because it scans the string character by character, does
> 5-6 comparisons for each character in order to escape the String and
> then appends each character to a StringBuilder. It's also probably not
> much better to store the xml-ified String, because you might need the
> xml structure for further processing later (e.g. digestedXHTML needs
> toXml) and then we come back to parse/toString.
I have to read to the code to better understand what it's used for.

> Is retaining scala.xml.Elem that bad?
> It's not as bad as the stemmer. Removing the stemmer might bring memory
usage down by about 50%.
But as soon the stemmer is removed toXML would be the next big chunk with
about 30%.

> Another question- how did you find that Scala's XML.loadString
> function actually reads the disk? This is something which can't be
> figured out only from memory profiling, right?
I did some quick cpu profiling last week. I didn't have too much time to
look at result, need to spend some time with this soon.
I also asked my Yourkit friends whether they still offer free licenses for
open source projects. Their new early access release has some interesting
features. I think it can still be downloaded for free for a limited time.


> On Tue, Dec 8, 2009 at 12:30 PM, Richard Hirsch <>
> wrote:
> > org.tartarus.snowball.ext.PorterStemmer is from the compass search.
> > Maybe we can configure it, so that it is not retained after usage.
> >
> > D.
> >
> > On Tue, Dec 8, 2009 at 1:03 AM, Markus Kohler <>
> wrote:
> >> Hi all,
> >> I've been busy otherwise, and therefore didn't find much time for ESME
> last
> >> week.
> >> I tried a few things with regards to performance.
> >> As you all noticed the performance on the performance instance is
> currently
> >> excellent.
> >> I tried various approaches to measure it, but most failed due to the
> coment
> >> requests, which the tools I usually use don't like.
> >> The best I could get are some numbers from the Firebug Firefox plugin.
>  It
> >> seems that the response time for entering a message until it appears in
> the
> >> users timeline is around 350ms, which is really excellent. It will be
> even
> >> harder to measure (using the browser) how long it takes for a message
> from
> >> one user to the user. I'm not sure how to do that yet. I tested manually
> >> sending messages from chrome to firefox and it' s really fast.
> >>
> >> I also let one of the 300+x Users send 1000 messages and did some heap
> >> dumps.
> >> I'm not yet fully through it but it's already clear that messages take
> up
> >> too much space.
> >> Around 1400 messages would need  9,3 Million bytes which means that in
> >> average one messages needs 6Kbyte!
> >> Ok there were probably also a lot of relatively long update status
> messages,
> >> but still I think this is too much.
> >>
> >> The reason seems to be that The messages still retain an instance to the
> >> Stemmer (org.tartarus.snowball.ext.PorterStemmer) which alone takes 2
> Kbyte.
> >> Do we really need this Stemmer after we ran it?
> >>
> >> Another reason is that scala.xml.Elem is referenced in the toXML field.
> I
> >> guess this is the result of parsing XML. Not sure whether this is still
> >> needed after it's done, but storing DOM like structures is for sure not
> >> memory efficient. originialXML looks similiar.
> >>
> >> It would be important to get these numbers down, otherwise we will be
> killed
> >> by memory usage as soon as we get a lot of messages send.
> >>
> >> I also asked on the Scala list about the loadXML function accessing the
> >> filesystem, but someone claimed this would not be the case in trunk and
> >> asked for the version. So maybe they can backport  a fix for this.
> >> I seem to remember during some profiling that this function is still
> used.
> >>
> >> Haven't had any time to draft a blog, but I hope I can start with that
> on
> >> Wednesday or Thursday.
> >>
> >> Regards,
> >> Markus
> >>
> >>
> >> "The best way to predict the future is to invent it" -- Alan Kay
> >>
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message