flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Read JSON file as input
Date Tue, 26 Apr 2016 10:02:40 GMT
Hi,

you need to implement the MapFunction interface [1].
Inside the MapFunction you can use any JSON parser library such as Jackson
to parse the String.
The exact logic depends on your use case.

However, you should be careful to not initialize a new parser in each map()
call, because that would be quite expensive.
I recommend to extend the RichMapFunction and instantiate a parser in the
open() method.

Best, Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/dataset_transformations.html#map
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/common/index.html#specifying-transformation-functions

2016-04-26 10:44 GMT+02:00 Punit Naik <naik.punit44@gmail.com>:

> Hi Fabian
>
> Thanks for the reply. Yes my json is separated by new lines. It would have
> been great if you had explained the function that goes inside the map. I
> tried to use the 'scala.util.parsing.json._' library but got no luck.
>
> On Tue, Apr 26, 2016 at 1:11 PM, Fabian Hueske <fhueske@gmail.com> wrote:
>
> > Hi Punit,
> >
> > JSON can be hard to parse in parallel due to its nested structure. It
> > depends on the schema and (textual) representation of the JSON whether
> and
> > how it can be done. The problem is that a parallel input format needs to
> be
> > able to identify record boundaries without context information. This can
> be
> > very easy, if your JSON data is a list of JSON objects which are
> separated
> > by a new line character. However, this is hard to generalize. That's why
> > Flink does not offer tooling for it (yet).
> >
> > If your JSON objects are separated by new line characters, the easiest
> way
> > is to read it as text file, where each line results in a String and parse
> > each object using a standard JSON parser. This would look like:
> >
> > ExecutionEnvironment env =
> ExecutionEnvironment.getExecutionEnvironment();
> >
> > DataSet<String> text = env.readTextFile("/path/to/jsonfile");
> > DataSet<YourObject> json = text.map(new
> YourMapFunctionWhichParsesJSON());
> >
> > Best, Fabian
> >
> > 2016-04-26 8:06 GMT+02:00 Punit Naik <naik.punit44@gmail.com>:
> >
> > > Hi
> > >
> > > I am new to Flink. I was experimenting with the Dataset API and found
> out
> > > that there is no explicit method for loading a JSON file as input. Can
> > > anyone please suggest me a workaround?
> > >
> > > --
> > > Thank You
> > >
> > > Regards
> > >
> > > Punit Naik
> > >
> >
>
>
>
> --
> Thank You
>
> Regards
>
> Punit Naik
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message