nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robinson, Richard A CTR USSOCOM HQ" <Richard.Robinson....@socom.mil>
Subject RE: ExtractText usage
Date Tue, 08 Sep 2015 17:52:47 GMT
Chris,

Although the output you described did not exactly match what I would expect, I think Bryan
is on the right track.

When I was playing with RouteOnContent I had similar problems finding lines that started with
a '"o'
The regex that worked was (?m)^\"o  

You may want something like  (?m)^(\"R.*)$


robi

-----Original Message-----
From: Bryan Bende [mailto:bbende@gmail.com] 
Sent: Tuesday, September 08, 2015 1:03 PM
To: users@nifi.apache.org
Subject: Re: ExtractText usage

Chris,

I think the issue is that ExtractText is not reading the file line by line, and then applying
your pattern to each line. It is applying the pattern to the whole content of the file so
you would need a regex that repeated the pattern you were looking for so that it captured
multiple times.

When I tested your example, it was actually extracting the first match 3 times which I think
is because of the following...
- It always puts the first match in the property base name, in this case "regex", 
- then it puts the entire match in index 0, in this case regex.0, and in this case it is only
matching the first occurrence
- and then all of the matches would be in order after that staring with index 1, which in
this case there is only 1 match so it is just regex.1


Another solution that might simpler is to put a SplitText processor between GetFile and ExtractText,
and set the Line Split Count to 1. This will send 1 line at a time to your ExtractTextProcessor
which would then match only the lines starting with 'R'. 
The downside is that all of the lines with 'R' would be in different FlowFiles, but this may
or may not matter depending what you wanted to do with them after.

-Bryan


On Tue, Sep 8, 2015 at 12:12 PM, Christopher Wilson <wilsoncj1@gmail.com> wrote:


	I'm trying to read a directory of .csv files which have 3 different schemas/list types (not
my idea).  The descriptor is in the first column of the csv file.  I'm reading the files in
using GetFile and passing them into ExtractText, but I'm only getting the first 3 (of 8) lines
matching my first regex.  What I want to do is grab all the lines beginning with "R" and dump
them off to a file (for now).  My end goal would be to loop through these grab lines, or blocks
of lines, by regex and route them downstream based on that regex. 
	
	Details and first 11 lines of a sample file below.
	
	
	Thanks in advance.
	
	
	-Chris
	

	NiFi version: 0.2.1
	
	OS: Ubuntu 14.01
	
	JVM: java-1.7.0-openjdk-amd64
	
	
	ExtractText:
	
	
	Enable Multiline = True
	
	Enable Unix Lines Mode = True
	regex = ^("R.*)$
	


	"H","USA","BP","20140502","9","D","BP"
	"R","1","TB","CLM"," "," ","3U"," ","47000","0","47000","0"," ","0"," ","0"," ","0"," ","0","
","0"," ","0"," ","0","25000","25000"," ","650","F","D","D","6"," "," "," ","1:20PM ","1:51PM
","0122"," ","Clm 25000","Fast","","16","87"," ","","","64","117.39","2266","4648","11129","0","0","
","","112089","Good","Cloudy","","","Y"
	"R","2","TB","CLM"," ","B","3U"," ","34000","0","34000","0"," ","0"," ","0"," ","0"," ","0","
","0"," ","0"," ","0","25000","25000"," ","600","F","D","D","7"," "," "," ","1:51PM ","2:22PM
","0151"," ","Clm 25000N2L","Fast","","16","79"," ","","","64","112.36","2444","4803","10003","0","0","
","","261868","Poor","Cloudy","","","Y"
	"R","3","TB","STK","S"," ","3U"," ","100000","0","100000","0","A","100000"," ","0"," ","0","
","0"," ","0"," ","0"," ","0","0","0"," ","600","F","D","D","6"," ","Affirmed Success S.","AfrmdScsB","2:22PM
","2:53PM ","0222"," ","AfrmdScsB100k","Fast","","16","88"," ","","","64","110.54","2323","4618","5810","0","0","
","","259015","5","Clear","","","Y"
	"R","4","TB","MCL"," "," ","3U"," ","49200","0","49200","0"," ","0"," ","0"," ","0"," ","0","
","0"," ","0"," ","0","40000","40000"," ","850","F","D","D","8"," "," "," ","2:53PM ","3:24PM
","0253"," ","Md 40000","Fast","Y","30","72"," ","","","64","145.58","2425","4829","11358","13909","0","
","","260343","9","Clear","0","","Y"
	"R","5","TB","ALW"," "," ","3U"," ","77000","0","77000","0"," ","0"," ","0"," ","0"," ","0","
","0"," ","0"," ","0","0","0"," ","900","F","D","D","7"," "," "," ","3:24PM ","3:55PM ","0325","
","Alw 77000N1X","Fast","Y","30","74"," ","","","64","151.69","2330","4643","11156","13832","0","
","","302065","Good","Clear","","","Y"
	"R","6","TB","MSW","S","B","3U"," ","60000","1200","60000","0"," ","0"," ","0"," ","0","
","0"," ","0"," ","0"," ","0","0","0"," ","800","F","D","D","5"," "," "," ","3:55PM ","4:26PM
","0355"," ","Md Sp Wt 58k","Fast","","30","61"," ","","","64","140.64","2481","4931","11477","0","0","
","","161404","Good","Clear","","","Y"
	"R","7","TB","CLM"," ","B","3U"," ","40000","0","40000","0"," ","0"," ","0"," ","0"," ","0","
","0"," ","0"," ","0","20000","20000"," ","800","F","D","D","6"," "," "," ","4:26PM ","4:57PM
","0427"," ","Clm 20000","Fast","","30","68"," ","","","64","139.31","2337","4770","11402","0","0","
","","344306","Good","Clear","","","Y"
	"R","8","TB","ALW"," ","B","3U"," ","77000","0","77000","0"," ","0"," ","0"," ","0"," ","0","
","0"," ","0"," ","0","0","0"," ","850","F","D","D","7"," "," "," ","4:57PM ","5:28PM ","0457","
","Alw 77000N1X","Fast","","30","76"," ","","","64","144.76","2416","4847","11365","13836","0","
","","213021","Good","Clear","","","Y"
	"R","9","TB","STR"," "," ","3U"," ","60000","0","60000","0"," ","0"," ","0"," ","0"," ","0","
","0"," ","0"," ","0","0","40000"," ","700","F","D","D","8"," "," "," ","5:28PM ","      
","0528"," ","Alw 40000s","Fast","Y","16","81"," ","","","64","124.66","2339","4740","11211","0","0","
","","332649","6,8","Clear","0","","Y"
	"S","1","000008813341TB","Coolusive","20100124","KY","TB","Colt","Bay","Ice Cool Kitty","2003","TB","Elusive
Quality","1993","TB","Tomorrows Cat","1995","TB","Gone West","1984","TB","122","0","L","","28200","Velasquez","Cornelio","H.","
","Jacobson","David"," ","Drawing Away Stable and Jacobson, David"," "," ","265","N"," ","0","N","5","5","3","3","4","0","0","1","1","1","10","200","0","0","100","75","510","320","0","0","0","0","N","25000","4w
into lane, held","chase 2o turn, bid 4w turning for home,took over, held sway","7.30","3.80","2.70","Y","000000002103TE","TE","Barbara","Robert","
","000001976480O6","O6","Averill","Bradley","E."," ","N","0","N","","0","","87","Lansdon B.
Robbins & Kevin Callahan","000000257611TE","000000002695JE"
	
	


Mime
View raw message