flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Read JSON file as input
Date Wed, 27 Apr 2016 07:04:46 GMT
You should do the parsing in a Map operator. Map applies the MapFunction to
each element in the DataSet.
So you can either implement another MapFunction or extend the one you have
to call the JSON parser.

2016-04-27 6:40 GMT+02:00 Punit Naik <naik.punit44@gmail.com>:

> Hi
>
> So I managed to do the map part. I stuc with the "import
> scala.util.parsing.json._" library for parsing.
>
> First I read my JSON:
>
> val data=env.readTextFile("file:///home/punit/vik-in")
>
> Then I transformed it so that it can be parsed to a map:
>
> val j=data.map { x => ("\"\"\"").+(x).+("\"\"\"") }
>
>
> I check it by printing "j"s 1st value and its proper.
>
> But when I tried to parse "j" like this:
>
> JSON.parseFull(j.first(1)) ; it did not parse because the object
> "j.first(1)" is still a Dataset object and not a String object.
>
> So how can I get the underlying java object from the dataset object?
>
> On Tue, Apr 26, 2016 at 3:32 PM, Fabian Hueske <fhueske@gmail.com> wrote:
>
> > Hi,
> >
> > you need to implement the MapFunction interface [1].
> > Inside the MapFunction you can use any JSON parser library such as
> Jackson
> > to parse the String.
> > The exact logic depends on your use case.
> >
> > However, you should be careful to not initialize a new parser in each
> map()
> > call, because that would be quite expensive.
> > I recommend to extend the RichMapFunction and instantiate a parser in the
> > open() method.
> >
> > Best, Fabian
> >
> > [1]
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/dataset_transformations.html#map
> > [2]
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/common/index.html#specifying-transformation-functions
> >
> > 2016-04-26 10:44 GMT+02:00 Punit Naik <naik.punit44@gmail.com>:
> >
> > > Hi Fabian
> > >
> > > Thanks for the reply. Yes my json is separated by new lines. It would
> > have
> > > been great if you had explained the function that goes inside the map.
> I
> > > tried to use the 'scala.util.parsing.json._' library but got no luck.
> > >
> > > On Tue, Apr 26, 2016 at 1:11 PM, Fabian Hueske <fhueske@gmail.com>
> > wrote:
> > >
> > > > Hi Punit,
> > > >
> > > > JSON can be hard to parse in parallel due to its nested structure. It
> > > > depends on the schema and (textual) representation of the JSON
> whether
> > > and
> > > > how it can be done. The problem is that a parallel input format needs
> > to
> > > be
> > > > able to identify record boundaries without context information. This
> > can
> > > be
> > > > very easy, if your JSON data is a list of JSON objects which are
> > > separated
> > > > by a new line character. However, this is hard to generalize. That's
> > why
> > > > Flink does not offer tooling for it (yet).
> > > >
> > > > If your JSON objects are separated by new line characters, the
> easiest
> > > way
> > > > is to read it as text file, where each line results in a String and
> > parse
> > > > each object using a standard JSON parser. This would look like:
> > > >
> > > > ExecutionEnvironment env =
> > > ExecutionEnvironment.getExecutionEnvironment();
> > > >
> > > > DataSet<String> text = env.readTextFile("/path/to/jsonfile");
> > > > DataSet<YourObject> json = text.map(new
> > > YourMapFunctionWhichParsesJSON());
> > > >
> > > > Best, Fabian
> > > >
> > > > 2016-04-26 8:06 GMT+02:00 Punit Naik <naik.punit44@gmail.com>:
> > > >
> > > > > Hi
> > > > >
> > > > > I am new to Flink. I was experimenting with the Dataset API and
> found
> > > out
> > > > > that there is no explicit method for loading a JSON file as input.
> > Can
> > > > > anyone please suggest me a workaround?
> > > > >
> > > > > --
> > > > > Thank You
> > > > >
> > > > > Regards
> > > > >
> > > > > Punit Naik
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thank You
> > >
> > > Regards
> > >
> > > Punit Naik
> > >
> >
>
>
>
> --
> Thank You
>
> Regards
>
> Punit Naik
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message