flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Punit Naik <naik.puni...@gmail.com>
Subject Re: Read JSON file as input
Date Wed, 27 Apr 2016 04:40:07 GMT
Hi

So I managed to do the map part. I stuc with the "import
scala.util.parsing.json._" library for parsing.

First I read my JSON:

val data=env.readTextFile("file:///home/punit/vik-in")

Then I transformed it so that it can be parsed to a map:

val j=data.map { x => ("\"\"\"").+(x).+("\"\"\"") }


I check it by printing "j"s 1st value and its proper.

But when I tried to parse "j" like this:

JSON.parseFull(j.first(1)) ; it did not parse because the object
"j.first(1)" is still a Dataset object and not a String object.

So how can I get the underlying java object from the dataset object?

On Tue, Apr 26, 2016 at 3:32 PM, Fabian Hueske <fhueske@gmail.com> wrote:

> Hi,
>
> you need to implement the MapFunction interface [1].
> Inside the MapFunction you can use any JSON parser library such as Jackson
> to parse the String.
> The exact logic depends on your use case.
>
> However, you should be careful to not initialize a new parser in each map()
> call, because that would be quite expensive.
> I recommend to extend the RichMapFunction and instantiate a parser in the
> open() method.
>
> Best, Fabian
>
> [1]
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/dataset_transformations.html#map
> [2]
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/common/index.html#specifying-transformation-functions
>
> 2016-04-26 10:44 GMT+02:00 Punit Naik <naik.punit44@gmail.com>:
>
> > Hi Fabian
> >
> > Thanks for the reply. Yes my json is separated by new lines. It would
> have
> > been great if you had explained the function that goes inside the map. I
> > tried to use the 'scala.util.parsing.json._' library but got no luck.
> >
> > On Tue, Apr 26, 2016 at 1:11 PM, Fabian Hueske <fhueske@gmail.com>
> wrote:
> >
> > > Hi Punit,
> > >
> > > JSON can be hard to parse in parallel due to its nested structure. It
> > > depends on the schema and (textual) representation of the JSON whether
> > and
> > > how it can be done. The problem is that a parallel input format needs
> to
> > be
> > > able to identify record boundaries without context information. This
> can
> > be
> > > very easy, if your JSON data is a list of JSON objects which are
> > separated
> > > by a new line character. However, this is hard to generalize. That's
> why
> > > Flink does not offer tooling for it (yet).
> > >
> > > If your JSON objects are separated by new line characters, the easiest
> > way
> > > is to read it as text file, where each line results in a String and
> parse
> > > each object using a standard JSON parser. This would look like:
> > >
> > > ExecutionEnvironment env =
> > ExecutionEnvironment.getExecutionEnvironment();
> > >
> > > DataSet<String> text = env.readTextFile("/path/to/jsonfile");
> > > DataSet<YourObject> json = text.map(new
> > YourMapFunctionWhichParsesJSON());
> > >
> > > Best, Fabian
> > >
> > > 2016-04-26 8:06 GMT+02:00 Punit Naik <naik.punit44@gmail.com>:
> > >
> > > > Hi
> > > >
> > > > I am new to Flink. I was experimenting with the Dataset API and found
> > out
> > > > that there is no explicit method for loading a JSON file as input.
> Can
> > > > anyone please suggest me a workaround?
> > > >
> > > > --
> > > > Thank You
> > > >
> > > > Regards
> > > >
> > > > Punit Naik
> > > >
> > >
> >
> >
> >
> > --
> > Thank You
> >
> > Regards
> >
> > Punit Naik
> >
>



-- 
Thank You

Regards

Punit Naik

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message