flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Read JSON file as input
Date Tue, 26 Apr 2016 07:41:31 GMT
Hi Punit,

JSON can be hard to parse in parallel due to its nested structure. It
depends on the schema and (textual) representation of the JSON whether and
how it can be done. The problem is that a parallel input format needs to be
able to identify record boundaries without context information. This can be
very easy, if your JSON data is a list of JSON objects which are separated
by a new line character. However, this is hard to generalize. That's why
Flink does not offer tooling for it (yet).

If your JSON objects are separated by new line characters, the easiest way
is to read it as text file, where each line results in a String and parse
each object using a standard JSON parser. This would look like:

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

DataSet<String> text = env.readTextFile("/path/to/jsonfile");
DataSet<YourObject> json = text.map(new YourMapFunctionWhichParsesJSON());

Best, Fabian

2016-04-26 8:06 GMT+02:00 Punit Naik <naik.punit44@gmail.com>:

> Hi
>
> I am new to Flink. I was experimenting with the Dataset API and found out
> that there is no explicit method for loading a JSON file as input. Can
> anyone please suggest me a workaround?
>
> --
> Thank You
>
> Regards
>
> Punit Naik
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message