hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Gao <steve....@yahoo.com>
Subject RE: Is there a way to know the input filename at Hadoop Streaming?
Date Thu, 23 Oct 2008 18:09:33 GMT
Thanks, Amogh. But my case is slightly different. The command line inputs are 2 files: file1
and file2. I need to tell in the mapper which line is from which file:
#In mapper
while (<STDIN>){
  //how to tell the current line is from file1 or file2?
}

-jobconfs map.input.file param does not help in this case 
because file1 and file2 are both input.

-Steve

--- On Thu, 10/23/08, Amogh Vasekar <vasekar@yahoo-inc.com> wrote:
From: Amogh Vasekar <vasekar@yahoo-inc.com>
Subject: RE: Is there a way to know the input filename at Hadoop Streaming?
To: steve.gao@yahoo.com
Date: Thursday, October 23, 2008, 12:11 AM

Personally haven't worked with streaming but I guess the ur jobconfs
map.input.file param should do it for you.
-----Original Message-----
From: Steve Gao [mailto:steve.gao@yahoo.com] 
Sent: Thursday, October 23, 2008 7:26 AM
To: core-user@hadoop.apache.org
Cc: core-dev@hadoop.apache.org
Subject: Is there a way to know the input filename at Hadoop Streaming?

I am using Hadoop Streaming. The input are multiple files.
Is there a way to get the current filename in mapper?

For example:
$HADOOP_HOME/bin/hadoop  \
jar $HADOOP_HOME/hadoop-streaming.jar \
    -input file1 \
    -input file2 \
    -output myOutputDir \
    -mapper mapper \
    -reducer reducer

In mapper:
while (<STDIN>){
  //how to tell the current line is from file1 or file2?
}




      



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message