hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-1130) Provide a way to open and read a side file using an existing InputFormat
Date Wed, 21 Oct 2009 22:22:59 GMT
Provide a way to open and read a side file using an existing InputFormat
------------------------------------------------------------------------

                 Key: MAPREDUCE-1130
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1130
             Project: Hadoop Map/Reduce
          Issue Type: New Feature
            Reporter: Pradeep Kamath


In the Pig subproject there is a need to open a side file for implementing map side joins.
In some cases, the entire file needs to be read as a side file and in some cases, there is
a need to read a file beginning from a particular split to the last split. In order to use
existing InputFormats to achieve this, the pig code would need to mimic hadoop in terms of
calling InputFormat.getSplits and then for each split call  InputFormat.createRecordReader,
RecordReader.initialize() and then call RecordReader.nextKey() repeatedly till we reach end
of split - and then continue to the next split. It would be good if there are some utility
methods in Hadoop to achieve this - to read the file partially to the end or entirely to the
end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message