hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From daniel sikar <dsi...@gmail.com>
Subject Re: Image as input to M-R in Hadoop
Date Tue, 30 Nov 2010 00:52:38 GMT
Hi Aravinth

This is probably do-able with Hadoop Streaming.

Imagine you have copied a bunch of image files to HDFS and now you
want to point them to say, an executable. Odds are that executable
already exists with some command line options that would take amongst
other things, the file path of the image you would like to process.

Hadoop Streaming makes a number of environment variables available at
runtime, for instance "map_input_file" which gives you the file name
of the file being processed, and so forth. My guess is that there is
also an environment variable that will give you the filepath in the
local filesystem.

You need to code that in plus add a -file parameter to specify your
executable. If you are using Amazon's EMR, you will need to put your
code and executable into an S3 bucket, then specify the bucket name to
Hadoop Streaming.

Good luck


On 29 November 2010 22:49, Shrijeet Paliwal <shrijeet@rocketfuel.com> wrote:
> This gentleman here (see below) is doing a hadoop streaming magic and
> seems to be playing with the image features in map reducy way. Its not
> using hadoop's java api though, so no help there.
> Still you can check and see if the articles gives you some clues,
> http://techportal.ibuildings.com/2009/11/02/precision-color-searching-with-gmagick-and-amazon-elastic-mapreduce/
> PS: Pardon if the motivation in the article is orthogonal to yours.
> -Shrijeet
> On Mon, Nov 29, 2010 at 2:13 PM, Aravinth Bheemaraj
> <b.aravinth@gmail.com> wrote:
>> Michael, thanks a lot for your reply.
>> I got to compare the images based on pixels. So is it possible to process
>> the image based on Pixel values rather than XML records?
>> I have read somewhere that the class "InputFormat" can be customized to
>> handle images by extending "InputSplit" and "RecordReader". But I am unsure
>> of the methods which are to be overridden so that I can access pixels of the
>> image. Is there anyway you can help me with this?
>> Regarding the note, I am reading in a directory with multiple image files.
>> On Mon, Nov 29, 2010 at 4:08 PM, Michael Segel <michael_segel@hotmail.com>wrote:
>>> Hi,
>>> The short answer is yes you can process images in Hadoop.
>>> Think of the image as a multi-line byte stream.
>>> As to an existing class, I don't believe that it exists, but shouldn't be
>>> too difficult to cobble.
>>> (If you can read in XML records for processing you should be able to read
>>> in a file containing a series of images.)
>>> Note: I'm assuming that you're either reading in a directory w multiple
>>> image files, or an image file w multiple images. Otherwise you probably
>>> don't want to use Hadoop.
>>> > Date: Mon, 29 Nov 2010 14:56:35 -0500
>>> > Subject: Image as input to M-R in Hadoop
>>> > From: b.aravinth@gmail.com
>>> > To: general@hadoop.apache.org
>>> >
>>> > Hi,
>>> >
>>> > I am a beginner to Hadoop and I am looking for some help in implementing
>>> the
>>> > Mapper with an image as input. Is there any predefined Writable class for
>>> > processing image? If so, how do I use it?
>>> >
>>> > Also I have read somewhere that compressed formats cannot be processed in
>>> > Hadoop. If this is true, am I making any sense in saying that the JPEG
>>> > images (which are also compressed format) cannot be processed by Hadoop?
>>> > Please correct me if I have misunderstood this concept.
>>> >
>>> > Thanks,
>>> > --
>>> > Aravinth
>> --
>> Aravinth Bheemaraj
>> University of Florida

View raw message