nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Zhurakousky <ozhurakou...@hortonworks.com>
Subject Re: CSV/delimited to Parquet conversion via Nifi
Date Wed, 23 Mar 2016 03:05:30 GMT
I actually can’t agree more. . .
IMHO and especially the recent Spring integration effort it kind of brings an idea where a
Processor may have it’s own context-based extension mechanism. For example, here the context
is ‘transformation':

public interface Transformer<I,O> {
    O transform(I value);
}

And the processor exposes the location of the the actual implementation and its required dependencies.
I do understand that it may go a bit against the grain of NiFi idea of specialized and ready
to use components that only need to be configured, but with the proper design it can be done.

Just a thought
Oleg


On Mar 22, 2016, at 9:50 PM, Tony Kurc <trkurc@gmail.com<mailto:trkurc@gmail.com>>
wrote:


Interesting question. A couple discussion points: If we start doing a processor for each of
these conversions, it may become unwieldy (P(x,2) processors, where x is number of data formats?)
I'd say maybe a more general ConvertFormat processor may be appropriate, but then configuration
and code complexity may suffer. If there is a canonical internal data form and a bunch (2*x)
of convertXtocanonical, and convertcanonicaltoX processors, the flow could get complex and
the extra transform could be expensive.

On Mar 21, 2016 9:39 PM, "Dmitry Goldenberg" <dgoldenberg123@gmail.com<mailto:dgoldenberg123@gmail.com>>
wrote:
Since NiFi has ConvertJsonToAvro and ConvertCsvToAvro processors, would it make sense to add
a feature request for a ConvertJsonToParquet processor and a ConvertCsvToParquet processor?

- Dmitry

On Mon, Mar 21, 2016 at 9:23 PM, Matt Burgess <mattyb149@gmail.com<mailto:mattyb149@gmail.com>>
wrote:
Edmon,

NIFI-1663 [1] was created to add ORC support to NiFi. If you have a target dataset that has
been created with Parquet format, I think you can use ConvertCSVtoAvro then StoreInKiteDataset
to get flow files in Parquet format into Hive, HDFS, etc. Others in the community know a lot
more about the StoreInKiteDataset processor than I do.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-1663

On Mon, Mar 21, 2016 at 8:25 PM, Edmon Begoli <ebegoli@gmail.com<mailto:ebegoli@gmail.com>>
wrote:

Is there a way to do straight CSV(PSV) to Parquet or ORC conversion via Nifi, or do I always
need to push the data through some of the "data engines" - Drill, Spark, Hive, etc.?






Mime
View raw message