nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Zhurakousky <>
Subject Re: CSV/delimited to Parquet conversion via Nifi
Date Wed, 23 Mar 2016 03:05:30 GMT
I actually can’t agree more. . .
IMHO and especially the recent Spring integration effort it kind of brings an idea where a
Processor may have it’s own context-based extension mechanism. For example, here the context
is ‘transformation':

public interface Transformer<I,O> {
    O transform(I value);

And the processor exposes the location of the the actual implementation and its required dependencies.
I do understand that it may go a bit against the grain of NiFi idea of specialized and ready
to use components that only need to be configured, but with the proper design it can be done.

Just a thought

On Mar 22, 2016, at 9:50 PM, Tony Kurc <<>>

Interesting question. A couple discussion points: If we start doing a processor for each of
these conversions, it may become unwieldy (P(x,2) processors, where x is number of data formats?)
I'd say maybe a more general ConvertFormat processor may be appropriate, but then configuration
and code complexity may suffer. If there is a canonical internal data form and a bunch (2*x)
of convertXtocanonical, and convertcanonicaltoX processors, the flow could get complex and
the extra transform could be expensive.

On Mar 21, 2016 9:39 PM, "Dmitry Goldenberg" <<>>
Since NiFi has ConvertJsonToAvro and ConvertCsvToAvro processors, would it make sense to add
a feature request for a ConvertJsonToParquet processor and a ConvertCsvToParquet processor?

- Dmitry

On Mon, Mar 21, 2016 at 9:23 PM, Matt Burgess <<>>

NIFI-1663 [1] was created to add ORC support to NiFi. If you have a target dataset that has
been created with Parquet format, I think you can use ConvertCSVtoAvro then StoreInKiteDataset
to get flow files in Parquet format into Hive, HDFS, etc. Others in the community know a lot
more about the StoreInKiteDataset processor than I do.



On Mon, Mar 21, 2016 at 8:25 PM, Edmon Begoli <<>>

Is there a way to do straight CSV(PSV) to Parquet or ORC conversion via Nifi, or do I always
need to push the data through some of the "data engines" - Drill, Spark, Hive, etc.?

View raw message