nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shweta soni <sssmb...@gmail.com>
Subject Need help regarding some challenges while using Apache Nifi
Date Thu, 25 Jun 2020 18:06:25 GMT
Hello Team,



We are using Nifi in our data ingestion process. The version details
are:  *Nifi
1.11.4 , Cloudera Enterprise 5.16.2 and Hive 1.1*. I posted my issues on
Nifi SLACK channel, but did not get answer for some of my questions. So I
am posting all my queries here in ardent hope of getting
solutions/workarounds for them.  We are facing below issues:





1.       *SCENARIO*: In RDBMS Source we have Date/Timestamp column. In Hive
Destination we have Date/Timestamp columns but when we are trying to ingest
from source to destination, we are getting Int/Longwritable cannot be
written to Date/Timestamp errors in Hue. We are using following
processors:  QueryDatabaseProcessor à UpdateRecord( column mapping and
output schema) à PutHDFS à ReplaceText à PutHiveQL. Below are the Avro
Output Schema since we don’t have Date or Timestamp as datatype in Avro
Schema.

       {"name":"dob","type":["null",{"type":"long","logicalType":"timestamp-millis"}

        {"name":"doA","type":["null",{"type":"int","logicalType":"date"}


       *Q. Please let me know how can we put data/timestamp source
columns to data/timestamp destination columns?*






2.       *SCENARIO* : Decimal data is not being inserted in ORC table.

Solution: I am loading data in Avro table and then doing INSERT INTO ORC
table from it. This solution I found from Cloudera community.


 *Q. Is there any other solution for loading decimal data in ORC table?*






*3.       **SCENARIO**: *We have a 1 time full load flow in Nifi –
QueryDatabase à PutHiveQL. à LogAttribute. This acts as a pipeline in
our custom based UI. This will run only once. In Nifi UI we can
manually start processors to start the flow and once all the flowfiles
are processed and the success queue of PutHiveQL becomes empty we can
stop the processor in Nifi UI. But now we want to know
programmatically that this flow ended at particular time and we want
to show the pipeline status as completed in our custom based UI. So
how can we stimulate this?



*        Q. *Since Nifi is for continuous data transfer, how can we
know that a particular flow has ended?






4.       *SCENARIO** :*I have Hive table with complex datatypes i.e.
Array, Map. When I am trying to get this data via SELECTHIVEQL
processor , it is giving output in String format for all the columns.
Then in next UpdateProcessor it is giving error that string datatype
cannot be converted to Array or Map.

Avro Output Schema:

{"type": "array", "items": "double"}

{"type": "map", "values": "int"}


 *Q. How to handle complex datatype in Hive via Nifi. Source table as
Hive and destination table as another Hive table.*






5.       *SCENARIO*: In QueryDatabase Processor we have Max-value
column which helps in incremental load. But there is not such
functionality for Hive table incremental load(i.e.SELECTHIVEQL). I
tried with GenerateTableFetch processor and QueryDatabase processor
using Hive1_1 Connection service but it is not working. I was told on
Nifi SLACK channel to raise JIRA for new processor
GenerateHiveTableFetch/QueryHiveDatabase processor.



*Q. Is there any alternative in which we can handle hive table
incremental load or should I go ahead and raise JIRA for the same?*



Request you to please help us. Thanking you in anticipation.





Thanks & Regards,
Shweta Soni

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message