hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <>
Subject [jira] Created: (HIVE-1050) Reduce the memory foot-print of HiveInputSplit
Date Wed, 13 Jan 2010 08:38:54 GMT
Reduce the memory foot-print of HiveInputSplit

                 Key: HIVE-1050
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Zheng Shao

{{HiveInputSplit}} now inherits from {{FileSplit}} just because we want {{MapTask}} to forward
the file name of the mapper:
This makes {{HiveInputSplit}} big. See MAPREDUCE-1374

  private void updateJobWithSplit(final JobConf job, InputSplit inputSplit) {
    if (inputSplit instanceof FileSplit) {
      FileSplit fileSplit = (FileSplit) inputSplit;
      job.set("map.input.file", fileSplit.getPath().toString());
      job.setLong("map.input.start", fileSplit.getStart());
      job.setLong("map.input.length", fileSplit.getLength());"split: " + job.get("map.input.file")+", range: "
               + job.getLong("map.input.start", 0) + "-"
               + job.getLong("map.input.length", 0));


Once we move to the new MapReduce framework, we should be able to make smaller HiveInputFormat
which will reduce the amount of memory needed on {{JobClient}}.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message