hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From openresearch <Qiming...@openresearchinc.com>
Subject hadoop streaming binary input / image processing
Date Thu, 14 May 2009 16:39:55 GMT

All,

I have read some recommendation regarding image (binary input) processing
using Hadoop-streaming which only accept text out-of-box for now.
http://hadoop.apache.org/core/docs/current/streaming.html
https://issues.apache.org/jira/browse/HADOOP-1722
http://markmail.org/message/24woaqie2a6mrboc

However, I have not got any straight answer.

One recommendation is to put image data on HDFS, but we have to do "hdf
-get" for each file/dir and process it locally which is every expensive.

Another recommendation is to "...put them in a centralized place where all
the hadoop nodes can access them (via .e.g, NFS mount)..." Obviously, IO
will becomes bottleneck and it defeat the purpose of distributed processing. 

I also notice some enhancement ticket is open for hadoop-core. Is it
committed to any svn (0.21) branch? can somebody show me an example how to
take *.jpg files (from HDFS), and process files in a distributed fashion
using streaming?

Many thanks

-Qiming
-- 
View this message in context: http://www.nabble.com/hadoop-streaming-binary-input---image-processing-tp23544344p23544344.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Mime
View raw message