incubator-esme-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vassil Dichev <vdic...@apache.org>
Subject Re: Performance update: Message size in memory
Date Tue, 08 Dec 2009 11:29:00 GMT
Hey Markus, I love your performance tests- they always uncover
something interesting.

It's not a matter of configuration, but of coding practices. The
stemmer in Message is part of the class, and not of the companion
object (think "static" in Java). I will move it there and there should
be a notable improvement.

As for the scala.xml.Elem, I'm not sure what is the alternative. I
don't think that if we parse the XML every time and convert it to
String, it's better. Especially the text method is particularly
inefficient, because it scans the string character by character, does
5-6 comparisons for each character in order to escape the String and
then appends each character to a StringBuilder. It's also probably not
much better to store the xml-ified String, because you might need the
xml structure for further processing later (e.g. digestedXHTML needs
toXml) and then we come back to parse/toString.

Is retaining scala.xml.Elem that bad?

Another question- how did you find that Scala's XML.loadString
function actually reads the disk? This is something which can't be
figured out only from memory profiling, right?


On Tue, Dec 8, 2009 at 12:30 PM, Richard Hirsch <hirsch.dick@gmail.com> wrote:
> org.tartarus.snowball.ext.PorterStemmer is from the compass search.
> Maybe we can configure it, so that it is not retained after usage.
>
> D.
>
> On Tue, Dec 8, 2009 at 1:03 AM, Markus Kohler <markus.kohler@gmail.com> wrote:
>> Hi all,
>> I've been busy otherwise, and therefore didn't find much time for ESME last
>> week.
>> I tried a few things with regards to performance.
>> As you all noticed the performance on the performance instance is currently
>> excellent.
>> I tried various approaches to measure it, but most failed due to the coment
>> requests, which the tools I usually use don't like.
>> The best I could get are some numbers from the Firebug Firefox plugin.  It
>> seems that the response time for entering a message until it appears in the
>> users timeline is around 350ms, which is really excellent. It will be even
>> harder to measure (using the browser) how long it takes for a message from
>> one user to the user. I'm not sure how to do that yet. I tested manually
>> sending messages from chrome to firefox and it' s really fast.
>>
>> I also let one of the 300+x Users send 1000 messages and did some heap
>> dumps.
>> I'm not yet fully through it but it's already clear that messages take up
>> too much space.
>> Around 1400 messages would need  9,3 Million bytes which means that in
>> average one messages needs 6Kbyte!
>> Ok there were probably also a lot of relatively long update status messages,
>> but still I think this is too much.
>>
>> The reason seems to be that The messages still retain an instance to the
>> Stemmer (org.tartarus.snowball.ext.PorterStemmer) which alone takes 2 Kbyte.
>> Do we really need this Stemmer after we ran it?
>>
>> Another reason is that scala.xml.Elem is referenced in the toXML field. I
>> guess this is the result of parsing XML. Not sure whether this is still
>> needed after it's done, but storing DOM like structures is for sure not
>> memory efficient. originialXML looks similiar.
>>
>> It would be important to get these numbers down, otherwise we will be killed
>> by memory usage as soon as we get a lot of messages send.
>>
>> I also asked on the Scala list about the loadXML function accessing the
>> filesystem, but someone claimed this would not be the case in trunk and
>> asked for the version. So maybe they can backport  a fix for this.
>> I seem to remember during some profiling that this function is still used.
>>
>> Haven't had any time to draft a blog, but I hope I can start with that on
>> Wednesday or Thursday.
>>
>> Regards,
>> Markus
>>
>>
>> "The best way to predict the future is to invent it" -- Alan Kay
>>
>

Mime
View raw message