hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim R. Wilson" <wilson.ji...@gmail.com>
Subject Re: Image processing with Hadoop and PHP
Date Sun, 01 Jun 2008 22:02:13 GMT
Hi Sheraz,

As the others mentioned, one way to do this is via hadoop-streaming,
which allows you to specify any program as the mapper and reducer
parts of the MapReduce algorithm implemented by Hadoop.

In your case, I'd imagine a solution looking something like this:

1) Collect a batch of images together for processing and put them in a
centralized place where all the hadoop nodes can access them.

2) Create a text file which lists all the files, line by line - the
program that does this can be a simple CLI PHP script which glob's a
directory tree.  The text file will be used as input to your mapreduce
job.

3) Create a mapper program in PHP which reads file names, line-by-line
from standard input and manipulates the images, writing the new images
out to some location.  The output of the mapper should be serialized
statistics about what it just did.  For example, you'd probably want
to output the file name which was operated on, maybe the size of the
file, whether it was successful, or an error message if it failed -
that kind of thing.

4) Create a "reducer" program, also in PHP, which reads line-by-line
input from standard input.  This input will be the output of your
mapper program, so your reducer can do things like count how many
files were successfully moved, how many bytes were in all of those
files, etc.

5) To test everything, you can take the output file you created in
step 2 and pipe it into your mapper directly on the command line with
something like this:
  cat output.txt > php mapper.php
Then when that looks good, pipe them all together to see the reducer output:
  cat output.txt > php mapper.php | php reducer.php

6) Once your test runs all look good using this command-line
methodology on small subsets of images, you're ready to invoke
hadoop-streaming on your mapper/reducer pair.  You'll still have to
run step 2 for each batch to get a text file suitable for input to the
mapreduce job, but these steps can all be combined into a shell script
once it's working well.

Good luck!

-- Jim R. Wilson (jimbojw)

On Sun, Jun 1, 2008 at 10:56 AM, Yuri Kudryavcev <mail@ykud.com> wrote:
> Hi.
> I guess you can use Hadoop Streaming (
> http://wiki.apache.org/hadoop/HadoopStreaming), if you'd pack your php image
> processing into executables. That will run over Hadoop cluster.
> - Yuri.
>
> On Sun, Jun 1, 2008 at 7:38 PM, Sheraz Sharif <sheraz@m3b.net> wrote:
>
>> Hi all,
>>
>> I am new to Hadoop and have very little experience with java.  However, I
>> am very experienced with PHP.  I've seen one web page where a guy who wrote
>> a map-reduce function in PHP, and others in Python.
>>
>> I would like to receive hundreds, if not thousands of images a day and
>> process them into thumbnails or other transformations (rotations, cropping,
>> composting).  The distributed part of Hadoop seems perfect for this.  Can I
>> use PHP to do the actual image processing?
>>
>> I've seen one post last month about someone processing TIFF images, but
>> that was in Java.
>>
>> Thanks for any help or suggestions.
>>
>> Cheers!
>>
>>
>

Mime
View raw message