esme-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vassil Dichev <vdic...@apache.org>
Subject Re: We are down to 4.6 Mbyte
Date Sun, 29 Nov 2009 13:03:41 GMT
> I haven't looked in detail into last nights profiling  results, but it seems
> we are down to 4.6 Mbyte! That's an 5000x improvement!
> I plan to document the details on Monday night. If there's time left I will
> also start with drafting a longer blog post. It would be great if Vassil
> would provide a short description/explanation of his changes.

OK, this is going to be embarassing for me, but this is not actually
an improvement, but a return to the performance capabilities of ESME
from several months ago.

I'm not surprised that it was 5000x worse, because every time the
public/friends' timeline was displayed for any user, every message was
fetched from the database, converted to XML, transformed into XHTML
and JSON... Not only that, but every time a new message has been
received, this would force the timelines of all users, who receive the
message, to be rerendered again, which means again reloading from DB
and the same XML acrobatics for all 20 messages of the 2 timelines,
which causes 40 messages to be processed for each user.

To top it off, when the Textile parser was activated, its overhead was
multiplied 40 times per user, which for 300 users means 12000 messages
rerendered, just because one user decided to send a message! Yes, this
sounds horrible. David was indeed correct that the Textile parser
itself was not the main culprit, but just magnifying the effects of a
more serious bug.

The problem was the Message.findMessages method. It is supposed to
cache messages based on a LRU strategy. When I introduced access
pools, messages had to be controlled not only when loading them from
the DB, but from the cache. So I discarded the messages from the
temporary structure which had to be returned to the user. The messages
which were discarded would go to the finder method, where the query
constructed would make sure only messages from valid pools would be
returned (inefficiency one). Furthermore, I allowed a bug where
messages from the public pool would also always be discarded from the
cache and fetched from the DB (inefficiency two). So stuff would work,
but the cache wasn't used in practice.

In conclusion, this is just one more argument for keeping messages in
memory, instead of fetching them from the DB.

Another important conclusion is that performance tests are just as
important as unit and integration tests and can uncover functional
problems too, especially with caches.

Mime
View raw message