nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sven Davison <>
Subject RE: GetHTTP->ExtractText (Regex/User problem?)
Date Mon, 20 Jun 2016 18:50:42 GMT
Awsome. It’s coming along and the overhead it might have is worth the stress free setup.

Now I want to take the content of it and put it into a variable so it’s easier for me to
understand how to use it. I’m TRYING to get the variable (joke) filled with the content
of the tag. I can put it out to a file just fine, but trying to avoid a bunch of FileI/O overhead.


Sent from Mail for Windows 10

From: Simon Elliston Ball
Sent: Monday, June 20, 2016 1:52 PM
Cc: Lee Laim
Subject: Re: GetHTTP->ExtractText (Regex/User problem?) is
something of a classic on this subject. 

I would recommend using the ExtractXPath/XQuery or GetHTMLElement these
may be a little heavier on the processing, but will certainly save you a lot of problems with
parsing. This lets you use css selectors against html, which is more intuitive and robust
to parse HTML.


On 20 Jun 2016, at 18:43, Sven Davison <> wrote:

I had tried that but got a NULL value result.  Is there a setting w/in the extractor that
I need to change too?
Sent from Mail for Windows 10
From: Lee Laim
Sent: Monday, June 20, 2016 12:56 PM
Subject: Re: GetHTTP->ExtractText (Regex/User problem?)
Hi Sven, 
give this a try:
<div class=”content”>(.*?)<\/div>
On Mon, Jun 20, 2016 at 10:25 AM, Sven Davison <> wrote:
I have looked at the example for extracting text. I seen the example pulls the content between
the <title> tags. I’ve changed it to pull from the <h3> tags w/o problem. The
problem I’m having is pulling form something a bit more specific. I’m sure the problem
is with my understanding/usage of REGEX.
I’m trying to pull the content from this example.
<div class=”content”>this is the content I want to pull</div>
Any help would be super awesome. I’ve been banging my head for a bit here.
Sent from Mail for Windows 10

View raw message