nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yolanda Davis <yolanda.m.da...@gmail.com>
Subject Re: working with HTML table
Date Wed, 31 Aug 2016 14:06:55 GMT
Hi Stephane,

Here's something I hope can help.  In the GetHTMLElement instead of doing
the selector on "table td" try "table tr"  with an output type of "Text"
and a destination type of flowfile-content.  This should create flow files
for each row with data and extract the numeric text from the td elements in
that data.  From there you can use the ExecuteScript processor to trim the
whitespace, convert the text values into numbers and sum them. I was able
to get this to work with the javascript (ECMAScript) below and using the
example html you provided:

var flowFile = session.get();
if (flowFile != null) {

  var StreamCallback =
 Java.type("org.apache.nifi.processor.io.StreamCallback")
  var IOUtils = Java.type("org.apache.commons.io.IOUtils")
  var StandardCharsets = Java.type("java.nio.charset.StandardCharsets")

  flowFile = session.write(flowFile,
    new StreamCallback(function(inputStream, outputStream) {
        var text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
        var res = text.split(" ");
        var count = 0;
        for(i in res){
        if(parseInt(res[i]) != NaN){
        count+=parseInt(res[i]);
        }
        }

outputStream.write(count.toString().getBytes(StandardCharsets.UTF_8))
    }))
  flowFile = session.putAttribute(flowFile, "filename", flowFile.getId() +
'_count.txt');
  session.transfer(flowFile, REL_SUCCESS)
}

I've attached the template I used to do this which hopefully can help as
well.  Please let me know if you have any questions.

Yolanda


On Wed, Aug 31, 2016 at 3:52 AM, <Stephane.Tinseau@thomsonreuters.com>
wrote:

> Hi All,
>
>
>
> I’m trying to extract and doing calculation from HTML table with NIFI.
>
> The purpose of the test if doing an addition of each TD in the same TR and
> output the result in file.
>
> For this sample the result should be 23 and 43.
>
>
>
> My table looks like
>
>
>
> <table>
>
> <tr>
>
>           <td>11</td>
>
>           <td>12</td>
>
>      </tr>
>
>      <tr>
>
>           <td>21</td>
>
>           <td>22</td>
>
>      </tr>
>
> </table>
>
> My NIFI workflow is
>
>
>
> InvokeHTTP > Response > GetHTMLElement > Success > PutFile
>
>
>
> The CSS Selector for GetHTMLElement is table td.
>
> I know that GetHTMLElement produce 0-N element but I don’t know how I can
> perform calculation of them.
>
>
>
> All help will be grateful
>
>
>
> Thanks
>
> Regards
>
> Stephane
>
>
>
> · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
· · · · · · · ·
> · ·
> *Stephane Tinseau*
>
> *Thomson Reuters*
> stephane.tinseau@thomsonreuters.com
> thomsonreuters.com
>
>
>
> ------------------------------
>
> This e-mail is for the sole use of the intended recipient and contains
> information that may be privileged and/or confidential. If you are not an
> intended recipient, please notify the sender by return e-mail and delete
> this e-mail and any attachments. Certain required legal entity disclosures
> can be accessed on our website.
> <http://site.thomsonreuters.com/site/disclosures/>
>



-- 
--
yolanda.m.davis@gmail.com
@YolandaMDavis

Mime
View raw message