hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Gao <steve....@yahoo.com>
Subject Is there a way to know the input filename at Hadoop Streaming?
Date Thu, 23 Oct 2008 01:55:42 GMT
I am using Hadoop Streaming. The input are multiple files.
Is there a way to get the current filename in mapper?

For example:
$HADOOP_HOME/bin/hadoop  \
jar $HADOOP_HOME/hadoop-streaming.jar \
    -input file1 \
    -input file2 \
    -output myOutputDir \
    -mapper mapper \
    -reducer reducer

In mapper:
while (<STDIN>){
  //how to tell the current line is from file1 or file2?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message