Return-Path: X-Original-To: apmail-jmeter-dev-archive@minotaur.apache.org Delivered-To: apmail-jmeter-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 83EAE10315 for ; Thu, 26 Sep 2013 21:06:13 +0000 (UTC) Received: (qmail 46606 invoked by uid 500); 26 Sep 2013 21:06:13 -0000 Delivered-To: apmail-jmeter-dev-archive@jmeter.apache.org Received: (qmail 46583 invoked by uid 500); 26 Sep 2013 21:06:13 -0000 Mailing-List: contact dev-help@jmeter.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jmeter.apache.org Delivered-To: mailing list dev@jmeter.apache.org Received: (qmail 46575 invoked by uid 99); 26 Sep 2013 21:06:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Sep 2013 21:06:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of philippe.mouawad@gmail.com designates 209.85.223.170 as permitted sender) Received: from [209.85.223.170] (HELO mail-ie0-f170.google.com) (209.85.223.170) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Sep 2013 21:06:07 +0000 Received: by mail-ie0-f170.google.com with SMTP id x13so2257119ief.29 for ; Thu, 26 Sep 2013 14:05:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=X738xwJcxe8Da8InXgqbR3YbqOXpSRbXDipBKBDDpkQ=; b=w5LcpRlCmGv26uJQ43uAro3+dHFyU4994eCTaA3iK5bTYR8e5C4otL47rSEHNIE5Cn 5bxqRyh67uZey87YnjrNp+ySI398qq7G07r6SkizOn/idyk2+V1RV9msnymKOEQxKOsH sOqXMvgUZqjH1dHdXt9ZbcnoICrLmbTx+M6aeRwz3rhcXe8N6d+Q8VenmZB2GOzp93ku VhG30qaJYBo1AQajLXl3EyQFo/uxb/PImiTEZVjNjCU3HzbUtEBwbhMaWcbbl12k/3oL w0sRMEMt4DN3KrEiwgen3M5rquznw8zp+40iUKRf7tozt/OgWqCtAaF6dMh/V+jhFiRJ JxQQ== MIME-Version: 1.0 X-Received: by 10.43.3.196 with SMTP id nz4mr3657562icb.74.1380229546513; Thu, 26 Sep 2013 14:05:46 -0700 (PDT) Received: by 10.42.222.2 with HTTP; Thu, 26 Sep 2013 14:05:46 -0700 (PDT) In-Reply-To: References: Date: Thu, 26 Sep 2013 23:05:46 +0200 Message-ID: Subject: Re: htmlParser.className default value From: Philippe Mouawad To: "dev@jmeter.apache.org" Content-Type: multipart/alternative; boundary=bcaec516197dd43a9104e74fbbcd X-Virus-Checked: Checked by ClamAV on apache.org --bcaec516197dd43a9104e74fbbcd Content-Type: text/plain; charset=ISO-8859-1 On Thu, Sep 26, 2013 at 10:58 PM, sebb wrote: > On 26 September 2013 21:48, Philippe Mouawad > wrote: > > Hello, > > I really think this setting should be changed as > > HtmlParserHTMLParser is really catastrophic in terms of performance and > > memory use. > > > > Or at least a note should be added, but my preference goes to switching > to > > REGEXP which seems to be doing the job. > > I don't think we should change the default; it may well break test > plans as commenting out sections is a common practise. > Why not change the default and document that users can set the old parser to what it was ? Take a new comer, he won't read all documentation once, in my opinion, defaults should be the best options for performances. If users have issues with Regexp, we will have bugzillas and will fix them, they can provide the page for which parsing failed , as we already had a report on this, it's easy. While if we keep it like this, you will have users face OOM on high load tests because of this, and I am not sure they will report or if they do it could be much harder to find out it was due to this. And we will always have this "urban legend" about JMeter having OOM, which frankly is starting to upset me :-) > However by all means add a note to jmeter.properties and > component_reference > > > Regards > > Philippe > > > > > > On Sun, Mar 3, 2013 at 9:06 PM, Philippe Mouawad < > philippe.mouawad@gmail.com > >> wrote: > > > >> Hello, > >> > >> I made recently a Real world test which downloaded resources. > >> As the site started to slow down, I ended up having an OOM. > >> > >> Analyzing Heap Dump, I noticed one JMeterThread held around 3 mo which > >> majority was taken by DOM build by htmlparser. > >> > >> So I think Regexp is far more efficient on memory usage. But if you say > it > >> is a quick and dirty alternative then it's another point. > >> > >> I wonder if it would not be interesting to explore using JSOUP in a new > >> implementation. > >> > >> Regards > >> Philippe > >> > >> > >> On Sun, Mar 3, 2013 at 3:42 PM, sebb wrote: > >> > >>> On 2 March 2013 19:42, Philippe Mouawad > >>> wrote: > >>> > Hello, > >>> > I was wondering if there is any reason for htmlParser.className > default > >>> > value being > org.apache.jmeter.protocol.http.parser.HtmlParserHTMLParser > >>> and > >>> > not org.apache.jmeter.protocol.http.parser.RegexpHTMLParser > >>> > > >>> > It seems to me the latter is much more efficient than the current > >>> default > >>> > value. > >>> > >>> I think one would need to benchmark that to see how much faster it is. > >>> > >>> > Any objection on changing to > >>> > org.apache.jmeter.protocol.http.parser.RegexpHTMLParser > >>> > >>> The Regex version does not take account of context, so will find > >>> references in comment sections. > >>> > >>> It was intended as a quick and dirty alternative. > >>> > >>> > -- > >>> > Regards. > >>> > Philippe > >>> > >> > >> > >> > >> -- > >> Cordialement. > >> Philippe Mouawad. > >> > >> > >> > > > > > > -- > > Cordialement. > > Philippe Mouawad. > -- Cordialement. Philippe Mouawad. --bcaec516197dd43a9104e74fbbcd--