jmeter-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philippe Mouawad <philippe.moua...@gmail.com>
Subject Re: htmlParser.className default value
Date Thu, 26 Sep 2013 21:05:46 GMT
On Thu, Sep 26, 2013 at 10:58 PM, sebb <sebbaz@gmail.com> wrote:

> On 26 September 2013 21:48, Philippe Mouawad <philippe.mouawad@gmail.com>
> wrote:
> > Hello,
> > I really think this setting should be changed as
> > HtmlParserHTMLParser is really catastrophic in terms of performance and
> > memory use.
> >
> > Or at least a note should be added, but my preference goes to switching
> to
> > REGEXP which seems to be doing the job.
>
> I don't think we should change the default; it may well break test
> plans as commenting out sections is a common practise.
>

Why not change the default and document that users can set the old parser
to what it was ?
Take a new comer, he won't read all documentation once, in my opinion,
defaults should be the best options for performances.

If users have issues with Regexp, we will have bugzillas and will fix them,
they can provide the page for which parsing failed , as we already had a
report on this, it's easy.

While if we keep it like this, you will have users face OOM on high load
tests because of this, and I am not sure they will report or if they do it
could be much harder to find out it was due to this.
And we will always have this "urban legend" about JMeter having OOM, which
frankly is starting to upset me :-)


> However by all means add a note to jmeter.properties and
> component_reference
>
> > Regards
> > Philippe
> >
> >
> > On Sun, Mar 3, 2013 at 9:06 PM, Philippe Mouawad <
> philippe.mouawad@gmail.com
> >> wrote:
> >
> >> Hello,
> >>
> >> I made recently a Real world test which downloaded resources.
> >> As the site started to slow down, I ended up having an OOM.
> >>
> >> Analyzing Heap Dump, I noticed one JMeterThread held around 3 mo which
> >> majority was taken by DOM build by htmlparser.
> >>
> >> So I think Regexp is far more efficient on memory usage. But if you say
> it
> >> is a quick and dirty alternative then it's another point.
> >>
> >> I wonder if it would not be interesting to explore using JSOUP in a new
> >> implementation.
> >>
> >> Regards
> >> Philippe
> >>
> >>
> >> On Sun, Mar 3, 2013 at 3:42 PM, sebb <sebbaz@gmail.com> wrote:
> >>
> >>> On 2 March 2013 19:42, Philippe Mouawad <philippe.mouawad@gmail.com>
> >>> wrote:
> >>> > Hello,
> >>> > I was wondering if there is any reason for htmlParser.className
> default
> >>> > value being
> org.apache.jmeter.protocol.http.parser.HtmlParserHTMLParser
> >>> and
> >>> > not org.apache.jmeter.protocol.http.parser.RegexpHTMLParser
> >>> >
> >>> > It seems to me the latter is much more efficient than the current
> >>> default
> >>> > value.
> >>>
> >>> I think one would need to benchmark that to see how much faster it is.
> >>>
> >>> > Any objection on changing to
> >>> > org.apache.jmeter.protocol.http.parser.RegexpHTMLParser
> >>>
> >>> The Regex version does not take account of context, so will find
> >>> references in comment sections.
> >>>
> >>> It was intended as a quick and dirty alternative.
> >>>
> >>> > --
> >>> > Regards.
> >>> > Philippe
> >>>
> >>
> >>
> >>
> >> --
> >> Cordialement.
> >> Philippe Mouawad.
> >>
> >>
> >>
> >
> >
> > --
> > Cordialement.
> > Philippe Mouawad.
>



-- 
Cordialement.
Philippe Mouawad.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message