nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Burns <jzbu...@gmail.com>
Subject Re: ExtractText Processor
Date Thu, 25 Feb 2016 09:44:58 GMT
Hi,

Thank you for the reply. I am trying to solve something I thought would be
fairly simple but not having much success:

Consider the string "my friend and I went for a long walk. It was raining
and it was very cold". When tested against one single Java regex
(.{9}and.{9})+ results in two matches: "y friend and I went f" and "raining
and it was v".

In NiFi I wish to do something similar, ie, capture all the matching
strings for a given regex (similar to grep). When I run the above regex in
NiFi I see only the first match but not the second.

Could you advise how I can access all matches for the regex. The use case
here is to monitor websites for specific a word and extract (say) 10
characters either side of the matching word - for all matches on the site.

Thanks again

John


On Mon, Feb 22, 2016 at 7:05 AM, Conrad Crampton <
conrad.crampton@secdata.com> wrote:

> Hi John,
> If you use a property for your regexp called matches for example that has
> many capture groups in it e.g.
> matches (?:^(.+) (\d+)$)
> If this matches the incoming flow file, then you will end up after
> processing with 3 attributes.
> matches
> matches.1
> matches.2
>
> With the matches and matches.1 being the same value (of the first capture
> group). If you set the ‘Include Capture Group 0’ to be true you get an
> additional attribute matches.0 that is the whole match group (as with Java
> RegExp class.
>
> HTH,
> Conrad
>
> From: John Burns <jzburns@gmail.com>
> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
> Date: Sunday, 21 February 2016 at 20:04
> To: "users@nifi.apache.org" <users@nifi.apache.org>
> Subject: ExtractText Processor
>
> Hi,
>
> I'm using ExtractText processor to monitor a website for specific content
> terms and log matches to a database. However, according to the documents on
> ExtractText ".....If the Regular Expression matches more than once, only
> the first match will be used"
>
> Do I understand this correctly as meaning that only the first regex match
> of a given term will be captured (as opposed to how grep works for
> example). I want to capture all occurrences of the match not just the first.
>
> Any help would be appreciated.
>
> Many thanks
>
> John
>
>
> ***This email originated outside SecureData***
>
> Click here <https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to
> report this email as spam.
>
>
> SecureData, combating cyber threats
>
> ------------------------------
>
> The information contained in this message or any of its attachments may be
> privileged and confidential and intended for the exclusive use of the
> intended recipient. If you are not the intended recipient any disclosure,
> reproduction, distribution or other dissemination or use of this
> communications is strictly prohibited. The views expressed in this email
> are those of the individual and not necessarily of SecureData Europe Ltd.
> Any prices quoted are only valid if followed up by a formal written quote.
>
> SecureData Europe Limited. Registered in England & Wales 04365896.
> Registered Address: SecureData House, Hermitage Court, Hermitage Lane,
> Maidstone, Kent, ME16 9NT
>

Mime
View raw message