hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Hacker <jayqhac...@gmail.com>
Subject Re: Using hadoop streaming with binary data
Date Thu, 21 Feb 2013 17:45:17 GMT
I was able to write a little code to make this happen, and submitted a
patch to Hadoop:

https://issues.apache.org/jira/browse/MAPREDUCE-5018

There is a jar file and shell script there for anybody who wants to try
this without recompiling all of Hadoop.  It lets you run something like
"mapstream indir md5sum outdir" and get one map job per file in indir with
real raw binary data passed to your map command and the output written to a
file in outdir.  This makes it easy to run all your favorite Unix commands
as map-only streaming jobs, taking advantage of reliable distributed
execution.

Mime
View raw message