hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: Hadoop and image processing?
Date Thu, 03 Mar 2011 14:42:59 GMT

On Mar 3, 2011, at 1:23 AM, nigelsandever@btconnect.com wrote:

> How applicable would Hadoop be to the processing of thousands of large (60-100MB) 3D
image files accessible via NFS, using a 100+ machine cluster?
> Does the idea have any merit at all?

It may be a good idea.  If you think the above is a viable architecture for data processing,
then you likely don't "need" Hadoop because your problem is small enough, or you spent way
too much money on your NFS server.

Whether or not you "need" Hadoop for data scalability - petabytes of data moved at gigabytes
a second - is a small aspect of the question.

Hadoop is a good data processing platform in its own right.  Traditional batch systems tend
to have very Unix-friendly APIs for data processing (you'll find yourself writing perl script
that create text submit files, shell scripts, and C code), but appear clumsy to "modern developers"
(this is speaking as someone who lives and breathes batch systems).  Hadoop has "nice" Java
APIs and is Java developer friendly, has a lot of data processing concepts built in compared
to batch systems, and extends OK to other langauges.

If you write your image processing in Java, it would be silly to not consider Hadoop.  If
you currently run a bag full of shell scripts and C++ code, it's a tougher decision to make.

View raw message