nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy LoPresto <alopre...@apache.org>
Subject Re: FuzzyHashContent/CompareFuzzyHash processor
Date Mon, 09 Oct 2017 18:04:14 GMT
You need to extract the relevant fields and either modify the flowfile content inline (losing
the other data) or create a new flowfile (you can still retain the complete content in the
“original” flowfile) and pass the flowfile with only the content you want to perform the
hash on to the FuzzyHashContent processor.

For the data you have provided (I’m assuming this is a single line of values, rather than
the structure and there exist many lines), you could use a ReplaceText processor to drop unrelated
columns. If you have multiple rows in the flowfile content, you can use a CSVRecordReader/ScriptedReader
and CSVRecordSetWriter/ScriptedRecordSetWriter in conjunction with an UpdateRecord processor
to reduce the content down to just the relevant fields, and then use a SplitRecord processor
to generate individual flowfiles from each line, and pass all of them to FuzzyHashContent.


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Oct 9, 2017, at 4:19 AM, shankhamajumdar <shankha.majumdar@lexmark.com> wrote:
> 
> Hi Andy,
> 
> Thanks for the reply. But I am still not able to solve my use case. For
> example
> 
> I have a data file in the below structure.
> 
> Col1      Col2      Col3      Col4      Col5
> 
> Test1    Test2     Test3     Test4     Test5
> 
> I want to do a fuzzy matching on Col2 and Col3 and generate an output file.
> 
> I am using getFile and FuzzyHashContent processor but not able to design the
> flow. Need your help on this.
> 
> Regards,
> Shankha
> 
> 
> 
> 
> 
> 
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Mime
View raw message