hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhengguo 'Mike' SUN <zhengguo...@yahoo.com>
Subject Re: [Help needed] Is there a way to know the input filename at Hadoop Streaming?
Date Thu, 23 Oct 2008 18:16:10 GMT
I guess one trick you can do without the help of hadoop is to encode the file identifier inside
the file itself. For example, each line of file1 could start with 1'space''content of the
original line'.



----- Original Message ----
From: Steve Gao <steve.gao@yahoo.com>
To: core-user@hadoop.apache.org
Cc: core-dev@hadoop.apache.org
Sent: Thursday, October 23, 2008 1:48:11 PM
Subject: [Help needed] Is there a way to know the input filename at Hadoop Streaming?

Sorry for the email. Thanks for any help or hint.

    I am using Hadoop Streaming. The input are multiple files.
    Is there a way to get the current filename in mapper?

    For example:
    $HADOOP_HOME/bin/hadoop  \
    jar $HADOOP_HOME/hadoop-streaming.jar \
        -input file1 \
        -input file2 \
        -output myOutputDir \
        -mapper mapper \
        -reducer reducer

    In mapper:
    while (<STDIN>){
      //how to tell the current line is from file1 or file2?
    }


      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message