hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zak Stone <zst...@gmail.com>
Subject Re: streaming a binary processing file
Date Wed, 03 Jun 2009 22:41:06 GMT
One simple solution is to use Dumbo, a Python interface to Hadoop that
supports binary streaming:

http://wiki.github.com/klbostee/dumbo

Zak


On Wed, Jun 3, 2009 at 5:18 PM, openresearch
<Qiming.He@openresearchinc.com> wrote:
>
> Hi all,
>
> I have a urgent question regarding processing binary (image) data using
> Hadoop streaming.
> I am looking for simplest solution, preferably without making change to
> hadoop and/or streaming package.
>
> I got some hints from this mailing list, including using customized
> InputFormat, or sequencefileInputForm. but nothing really help me out. Here
> is my problem:
>
> 1. A lot of binary (image) files stored on HDFS.
> 2. a standalone executable take binary (e.g., image) filename as input (key)
> and export small metadata as value (e.g., size of image)
>
> How can we passing the this standalone program as a mapper to streaming to
> process image across all nodes, given streaming currently only takes stdin
> by default.
>
> Thanks.
>
> -Qiming
>
>
> --
> View this message in context: http://www.nabble.com/streaming-a-binary-processing-file-tp23859645p23859645.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Mime
View raw message