flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@apache.org>
Subject Re: How to use org.apache.hadoop.mapreduce.lib.input.MultipleInputs in Flink
Date Sat, 17 Jan 2015 19:46:49 GMT
Why don't you just create two data sources that each wrap the ParquetFormat
using a HadoopInputFormat and join them as for example done in the TPCH Q3
example [1]

I always found the MultipleInputFormat to be an ugly workaround for
Hadoop's deficiency to read data from multiple sources.
AFAIK, Hadoop's MultipleInputFormat does not provide data colocation that a
join could exploit. Or is there any other beneficial property that I am not
aware of?

[1]
https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/TPCHQuery3.java



2015-01-17 20:15 GMT+01:00 Felix Neutatz <neutatz@googlemail.com>:

> Hi,
>
> is there any example which shows how I can load several files with
> different Hadoop input formats at once? My use case is that I want to load
> two tables (in Parquet format) via Hadoop and join them within Flink.
>
> Best regards,
>
> Felix
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message