hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From openresearch <Qiming...@openresearchinc.com>
Subject hadoop streaming binary input / image processing
Date Thu, 14 May 2009 16:39:55 GMT


I have read some recommendation regarding image (binary input) processing
using Hadoop-streaming which only accept text out-of-box for now.

However, I have not got any straight answer.

One recommendation is to put image data on HDFS, but we have to do "hdf
-get" for each file/dir and process it locally which is every expensive.

Another recommendation is to "...put them in a centralized place where all
the hadoop nodes can access them (via .e.g, NFS mount)..." Obviously, IO
will becomes bottleneck and it defeat the purpose of distributed processing. 

I also notice some enhancement ticket is open for hadoop-core. Is it
committed to any svn (0.21) branch? can somebody show me an example how to
take *.jpg files (from HDFS), and process files in a distributed fashion
using streaming?

Many thanks

View this message in context: http://www.nabble.com/hadoop-streaming-binary-input---image-processing-tp23544344p23544344.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

View raw message