jmeter-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Duc Chau <chauman...@gmail.com>
Subject Re: Regulation Expression alternative
Date Mon, 24 Jan 2011 22:57:30 GMT
Your sample HTML table is not well structured but it may help

Let's try <span class="sbListText">(.+?)</span>. Just tested using
http://gskinner.com/RegExr/. It returns data1, data2, etc.

On Tue, Jan 25, 2011 at 8:31 AM, Deepak Shetty <shettyd@gmail.com> wrote:

> what is it that you want to select? all the columns? that are not titles
> would be something like
> //tbody/tr/td/span (but this will flatten out the structure)?
>
> regards
> deepak
>
> On Mon, Jan 24, 2011 at 10:08 AM, thanh nguyen <mailinglistfan@gmail.com
> >wrote:
>
> > Felix,
> >
> > I'll have look at the xpath. it looks interesting. But I can't find any
> > example of code for xpath?
> > Thank you
> > Thanh
> >
> > ps: this is the table I'm working on. 1st row is the title. 2nd row
> > contains
> > data. I want to extract data1, data2....the regular expression reads row
> by
> > row. In the beanshell I do 2 loop: for each row and for each column.
> There
> > are rows number odd and rows number even.
> >
> >
> > <table>
> > <tr><th class="sbListHeaderCellEnd" scope="col" valign="top"
> width="5"><img
> > alt="" height="5" src="/assets/common/img/cnr_t_tl.gif"
> width="5"></th><th
> > class="sbListHeaderCell" nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_name')" onclick="submitForm1023(event);return
> > false;" title="Sort by column Title">Title1</a></span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title2</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_deliveryType')"
> > onclick="submitForm1024(event);return false;" title="Sort by column
> > Delivery
> > Type">Title3</a></span></th><td class="sbListColumnSpacer"><img
alt=""
> > border="0" height="1" src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell" nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_regStartDate')"
> > onclick="submitForm1025(event);return false;" title="Sort by column
> > Registration Date">Title4</a></span></th><td
> > class="sbListColumnSpacer"><img
> > alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> > width="1"></td><th class="sbListHeaderCell" nowrap="true"
> scope="col"><img
> > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_completionStatus')"
> > onclick="submitForm1026(event);return false;" title="Sort by column
> > Completion Status">Title5</a></span></th><td
> > class="sbListColumnSpacer"><img
> > alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> > width="1"></td><th class="sbListHeaderCell" nowrap="true"
> scope="col"><img
> > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_completionDate')"
> > onclick="submitForm1027(event);return false;" title="Sort by column Date
> > Marked Complete">Title6</a></span></th><td
> class="sbListColumnSpacer"><img
> > alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> > width="1"></td><th class="sbListHeaderCell" nowrap="true"
> scope="col"><img
> > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title7</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_score')"
> onclick="submitForm1028(event);return
> > false;" title="Sort by column Score">Title8</a></span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_grade')"
> onclick="submitForm1029(event);return
> > false;" title="Sort by column Grade">Title9</a></span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title10</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title11</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title12</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title13</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_startDate')"
> > onclick="submitForm1030(event);return false;" title="Sort by column
> > Offering
> > Start Date">Title14</a></span></th><td class="sbListColumnSpacer"><img
> > alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> > width="1"></td><th class="sbListHeaderCell" nowrap="true"
> scope="col"><img
> > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title15</span></th><th align="right"
> > class="sbListHeaderCellEnd" scope="col" valign="top" width="5"><img
> alt=""
> > height="5" src="/assets/common/img/cnr_t_tr.gif" width="5"></th></tr>
> >
> > <tr><td class="sbListOddCellEnd"></td><td class="sbListOddCell"><span
> > class="sbListText"><a class="sbLinkTableDisplay" doTruncate="false"
> > href="javascript:void('titleLink')" onclick="submitForm1031(event);return
> > false;" title="data1">data1</a></span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">&nbsp;</span></td><td
> > class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">data2</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">data3</span></td><td
> class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText" nowrap="nowrap"><span
> > class="sbListText">data4</span><br><a class="sbLinkTableDisplay"
> > doTruncate="false" href="javascript:void('blah')"
> > onclick="submitForm1033(event);return false;" title="blah
> > blah">blah</a></span></td><td class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">data5</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">&nbsp;</span></td><td
> > class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">&nbsp;</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">&nbsp;</span></td><td
> > class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">data6</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">data7</span></td><td
> class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">data8</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">data8</span></td><td
> class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">&nbsp;</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell" nowrap><a
> > class="sbLinkTableDisplay" doTruncate="false"
> > href="javascript:void('editLink')" onclick="submitForm1035(event);return
> > false;" title="Edit">Edit</a><br><a class="sbLinkTableDisplay"
> > doTruncate="false" href="javascript:void('deleteLink')"
> > onclick="submitForm1036(event);return false;"
> > title="Delete">Delete</a><br><br></td><td
> > class="sbListOddCellEnd"></td></tr><tr>
> >
> > </table>
> >
> >
> >
> > On Mon, Jan 24, 2011 at 10:34 AM, Felix Frank <ff@mpexnet.de> wrote:
> >
> > > On 01/24/2011 04:27 PM, thanh nguyen wrote:
> > > > Hi everyone,
> > > >
> > > > I have a big HTML table from which I need to extract data. The table
> > has
> > > > several columns. The regulation expression required to do the
> > extraction
> > > job
> > > > is very long and complex. The code is hard to debug and to maintain.
> > I'd
> > > > like to know what are the alternatives? Is there HTML parser that
> > create
> > > DOM
> > > > objects? I could program a postprocessor in beanshell...
> > > >
> > > > Thanks a lot
> > >
> > > That would be the XPath Extractor, but maybe someone can help you build
> > > a simpler regex instead (you need to share more details for this to
> > > happen).
> > >
> > > Regards,
> > > Felix
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: jmeter-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: jmeter-user-help@jakarta.apache.org
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message