hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arko Provo Mukherjee <arkoprovomukher...@gmail.com>
Subject Changing the InputFormat
Date Thu, 26 Feb 2015 23:47:54 GMT
Hello,

I am trying to write a Hadoop program that handles JSON and hence wrote a
CustomInputFormat to handle the data. The Custom format extends the
RecordReader and then overrides the nextKeyValue() method.

However, this doesn't solve the problem when one JSON object is split
across two InputSplit. I was wondering if there is a way to change how to
Input file is broken in to InputSplits so that I can control it and not let
the JSON break between the splits.

Any help will be much appreciated!

Many thanks in advance!
Warm regards
Arko

Mime
View raw message