flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Punit Naik <naik.puni...@gmail.com>
Subject Re: Read JSON file as input
Date Wed, 27 Apr 2016 07:11:44 GMT
I just tried it and it still cannot parse it. It still takes the input as a
dataset object rather than a string.

On Wed, Apr 27, 2016 at 12:36 PM, Punit Naik <naik.punit44@gmail.com> wrote:

> Okay Thanks a lot Fabian!
>
> On Wed, Apr 27, 2016 at 12:34 PM, Fabian Hueske <fhueske@gmail.com> wrote:
>
>> You should do the parsing in a Map operator. Map applies the MapFunction
>> to
>> each element in the DataSet.
>> So you can either implement another MapFunction or extend the one you have
>> to call the JSON parser.
>>
>> 2016-04-27 6:40 GMT+02:00 Punit Naik <naik.punit44@gmail.com>:
>>
>> > Hi
>> >
>> > So I managed to do the map part. I stuc with the "import
>> > scala.util.parsing.json._" library for parsing.
>> >
>> > First I read my JSON:
>> >
>> > val data=env.readTextFile("file:///home/punit/vik-in")
>> >
>> > Then I transformed it so that it can be parsed to a map:
>> >
>> > val j=data.map { x => ("\"\"\"").+(x).+("\"\"\"") }
>> >
>> >
>> > I check it by printing "j"s 1st value and its proper.
>> >
>> > But when I tried to parse "j" like this:
>> >
>> > JSON.parseFull(j.first(1)) ; it did not parse because the object
>> > "j.first(1)" is still a Dataset object and not a String object.
>> >
>> > So how can I get the underlying java object from the dataset object?
>> >
>> > On Tue, Apr 26, 2016 at 3:32 PM, Fabian Hueske <fhueske@gmail.com>
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > you need to implement the MapFunction interface [1].
>> > > Inside the MapFunction you can use any JSON parser library such as
>> > Jackson
>> > > to parse the String.
>> > > The exact logic depends on your use case.
>> > >
>> > > However, you should be careful to not initialize a new parser in each
>> > map()
>> > > call, because that would be quite expensive.
>> > > I recommend to extend the RichMapFunction and instantiate a parser in
>> the
>> > > open() method.
>> > >
>> > > Best, Fabian
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/dataset_transformations.html#map
>> > > [2]
>> > >
>> > >
>> >
>> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/common/index.html#specifying-transformation-functions
>> > >
>> > > 2016-04-26 10:44 GMT+02:00 Punit Naik <naik.punit44@gmail.com>:
>> > >
>> > > > Hi Fabian
>> > > >
>> > > > Thanks for the reply. Yes my json is separated by new lines. It
>> would
>> > > have
>> > > > been great if you had explained the function that goes inside the
>> map.
>> > I
>> > > > tried to use the 'scala.util.parsing.json._' library but got no
>> luck.
>> > > >
>> > > > On Tue, Apr 26, 2016 at 1:11 PM, Fabian Hueske <fhueske@gmail.com>
>> > > wrote:
>> > > >
>> > > > > Hi Punit,
>> > > > >
>> > > > > JSON can be hard to parse in parallel due to its nested
>> structure. It
>> > > > > depends on the schema and (textual) representation of the JSON
>> > whether
>> > > > and
>> > > > > how it can be done. The problem is that a parallel input format
>> needs
>> > > to
>> > > > be
>> > > > > able to identify record boundaries without context information.
>> This
>> > > can
>> > > > be
>> > > > > very easy, if your JSON data is a list of JSON objects which
are
>> > > > separated
>> > > > > by a new line character. However, this is hard to generalize.
>> That's
>> > > why
>> > > > > Flink does not offer tooling for it (yet).
>> > > > >
>> > > > > If your JSON objects are separated by new line characters, the
>> > easiest
>> > > > way
>> > > > > is to read it as text file, where each line results in a String
>> and
>> > > parse
>> > > > > each object using a standard JSON parser. This would look like:
>> > > > >
>> > > > > ExecutionEnvironment env =
>> > > > ExecutionEnvironment.getExecutionEnvironment();
>> > > > >
>> > > > > DataSet<String> text = env.readTextFile("/path/to/jsonfile");
>> > > > > DataSet<YourObject> json = text.map(new
>> > > > YourMapFunctionWhichParsesJSON());
>> > > > >
>> > > > > Best, Fabian
>> > > > >
>> > > > > 2016-04-26 8:06 GMT+02:00 Punit Naik <naik.punit44@gmail.com>:
>> > > > >
>> > > > > > Hi
>> > > > > >
>> > > > > > I am new to Flink. I was experimenting with the Dataset
API and
>> > found
>> > > > out
>> > > > > > that there is no explicit method for loading a JSON file
as
>> input.
>> > > Can
>> > > > > > anyone please suggest me a workaround?
>> > > > > >
>> > > > > > --
>> > > > > > Thank You
>> > > > > >
>> > > > > > Regards
>> > > > > >
>> > > > > > Punit Naik
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Thank You
>> > > >
>> > > > Regards
>> > > >
>> > > > Punit Naik
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Thank You
>> >
>> > Regards
>> >
>> > Punit Naik
>> >
>>
>
>
>
> --
> Thank You
>
> Regards
>
> Punit Naik
>



-- 
Thank You

Regards

Punit Naik

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message