Return-Path: Delivered-To: apmail-incubator-esme-dev-archive@minotaur.apache.org Received: (qmail 4815 invoked from network); 26 Nov 2009 14:41:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Nov 2009 14:41:34 -0000 Received: (qmail 74175 invoked by uid 500); 26 Nov 2009 14:41:34 -0000 Delivered-To: apmail-incubator-esme-dev-archive@incubator.apache.org Received: (qmail 74146 invoked by uid 500); 26 Nov 2009 14:41:34 -0000 Mailing-List: contact esme-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: esme-dev@incubator.apache.org Delivered-To: mailing list esme-dev@incubator.apache.org Received: (qmail 74115 invoked by uid 99); 26 Nov 2009 14:41:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Nov 2009 14:41:34 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vdichev@gmail.com designates 209.85.219.220 as permitted sender) Received: from [209.85.219.220] (HELO mail-ew0-f220.google.com) (209.85.219.220) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Nov 2009 14:41:31 +0000 Received: by ewy20 with SMTP id 20so619232ewy.20 for ; Thu, 26 Nov 2009 06:41:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type:content-transfer-encoding; bh=5ToH7jVmS9M8NxxMPW5f7tW465HEH00DgV/Rg6n73+Y=; b=IwA4w/9PE9zltNDZx1WsCEsDx95tSisqR92mIy8wdc3mMkkmRb3mnQLLoES9tujiw+ L5UnmZ1EYB17RMUI4+XhFuh6Td6pnuuh0xYYwzsIRgm4PPHv4Utdff9N5lPLGFfZj1bh cB6YmZ9AbxpLjtmmruzaAcoBSK89n6XYxMo2Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=ARE0xeXLZp1hz46+kpclx5sInkVonL7Uf8MbIgUiLG8RbMR+7Zq+Vq/SB3AROh8clq 6HPFCASF2Um4DASVeHwgnsN9t4nlB7GbfhJJjvEnibJLhO4ngc/nQLmAdmB1yIXFbgu+ EuR1u6bAejBzrPWMfbkCUGpasSYwHTLoop+pA= MIME-Version: 1.0 Sender: vdichev@gmail.com Received: by 10.216.93.78 with SMTP id k56mr2854597wef.102.1259246469707; Thu, 26 Nov 2009 06:41:09 -0800 (PST) In-Reply-To: References: Date: Thu, 26 Nov 2009 16:41:09 +0200 X-Google-Sender-Auth: fdd3e2de14d3d6d3 Message-ID: Subject: Re: Removing textile from code base From: Vassil Dichev To: esme-dev@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable > We're not looking at the root cause of the problem. =A0The Textile stuff = is a > hit if we run it on each message for each user. =A0This is no different t= han > having an SQL query in the code that's a Cartesian product and throwing o= ut > SQL because of it. > > Let's find out where and why we keep loading the same message from the RD= BMS > rather than going to the message cache. > > Let's find out why we're hitting the RDBMS in general... there are > abstractions in the system (or at least were) that make RDBMS access a lo= cal > thing rather than a global thing. > > I'll have time on Monday to look at this, but running around chopping off > pieces of code and changing functionality isn't going to get us any close= r > to solving the problem... it's just going to cause the problem to be > manifest elsewhere. I did not remove the Textile parser only because it potentially causes problems. I think it doesn't fit very well and it's a bit of an overkill. First of all, for messages headings, tables and paragraphs are not such a good fit conceptually. Second, some elements from MsgParser clash with the Textile parser ones. For instance, links to images cannot be parsed because MsgParser takes turn first and converts it to an URL element first. Third, the way parsing with Textile is done is inefficient currently anyway. I parse every separate text element. Since text can be separated by urls, tags and usernames, that means I could invoke the Textile parser several times per message. For instance, this message has 4 text elements =3D> 4 Textile invocations: message with #tag and @username and http://blog.esme.us url in text Yes, if the performance analysis is correct, the Textile parser is not the cause of the problem. It might be easier to solve the problem without it. We even intended to include pluggable parser implementations some day. AFAICT, the problem was not that the RDBMS is queried every time (although that's how the PublicTimeline has worked from day 1 if I remember correctly). The problem, as explained by Markus, was that the message was formatted from the raw string every time it's accessed for rendering a timeline. The RDBMS was mentioned tangentially by Michael Bechauf(or someone else?). Markus, did I get this correctly? I still don't see how the message could be parsed several times, since digestedXHTML is lazy and so will be cached (this alone should make it *way* easier for Scala to write efficient implementations over Java). I want to profile the stacktrace where most strings are allocated. This should answer some questions. I also plan to remove rendering the public timeline on each user's timeline page. First of all because it's not cached, and second because it's not updated in real-time like the friends' timeline, but only after an explicit refresh of the browser. So the public timeline is not only slow, but might be confusing for the user, as they will expect it to work similarly to the personal timeline (as the layout is the same). Vassil