hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brose, Eric" <eric.br...@navteq.com>
Subject multiple inputs for mapper using python and streaming
Date Mon, 10 May 2010 18:58:37 GMT
Hey all...
New to hadoop and streaming using python.
I am trying to figure out how to use multiple inputs and struggling. For my mapping job I
need to load 1 hdfs file into a python list, then use another hdfs file for looping
I am trying something like this

#this would be input #1
For line in sys.stdin(#1):

#now for second input
For line in sys.stdin(#2):
    #load line into new python list

    For m in Pylist1:
        #try and find a match and do some other python stuff

My job is run with this
Bin/hadoop jar contrib/streaming/...jar -file /user/mapper.py -mapper /user/mapper.py -file
/user/reducer.py -reducer /user/reducer.py -input /hadoopfs/sample/* -output /hadoopfs/sampleout

Thanks ahead of time!

Eric Brose
Senior Data Analyst, NCS
NAVTEQ - Chicago
(T) +312-894-7318
(F) +312-894-8227

The information contained in this communication may be CONFIDENTIAL and is intended only for
the use of the recipient(s) named above.  If you are not the intended recipient, you are hereby
notified that any dissemination, distribution, or copying of this communication, or any of
its contents, is strictly prohibited.  If you have received this communication in error, please
notify the sender and delete/destroy the original message and any copy of it from your computer
or paper files.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message