nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: How to validate records in Hadoop using NiFi?
Date Sun, 10 Jan 2016 03:47:16 GMT
Hello Sudeep,

"Which NiFi processor can I use to split each record (separated by a
new line character)"

  For this the SplitText processor is rather helpful if you want to
split each line.  I recommend you do two SplitText processors in a
chain where one splits on every 1000 lines for example and then the
next one splits each line.  As long as you have back-pressure setup
this means you could split arbitrarily larger (in terms of number of
lines) source files and have good behavior.

..."and perform validations?"

  Consider if you want to validate each line in a text file and route
valid lines one way and invalid lines another way.  If this is the
case then you may be able to avoid using SplitText and simply use
RouteText instead as it can operate on the original file in a line by
line manner and perform expression based validation.  This would
operate in bulk and be quite efficient.

"For validations I want to verify a particular column value for each
record using a SQL query"

  Our ExecuteSQL processor is designed for executing SQL against a
JDBC accessible database.  It is not helpful at this point for
executing queries on line oriented data even if that data were valid
DML or something.  Interesting idea but not something we support at
this time.

I'm interested to understand your case more if you don't mind though.
You mention you're getting data from Sqoop into HDFS.  How is NiFi
involved in that flow - is it after data lands in HDFS you're pulling
it into NiFi?

Thanks
Joe

On Sat, Jan 9, 2016 at 10:32 PM, sudeep mishra <sudeepshekharm@gmail.com> wrote:
> Hi,
>
> I am pushing some database records into HDFS using Sqoop.
>
> I want to perform some validations on each record in the HDFS data. Which
> NiFi processor can I use to split each record (separated by a new line
> character) and perform validations?
>
> For validations I want to verify a particular column value for each record
> using a SQL query. I can see an ExecuteQuery processor. How can I
> dynamically pass query parameters to it. Also is there a way to execute the
> queries in bulk rather for each record.
>
> Kindly suggest.
>
> Apprecuate your help.
>
>
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
>
>
>
>
> --
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> sudeepshekharm@gmail.com

Mime
View raw message