hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Whitehead" <richard.whiteh...@ieee.org>
Subject Building a distributed system
Date Mon, 18 Jul 2016 16:17:20 GMT

I wonder if the community can help me get started.

I’m trying to design the architecture of a project and I think that using some Apache Hadoop
technologies may make sense, but I am completely new to distributed systems and to Apache
(I am a very experienced developer, but my expertise is image processing on Windows!).

The task is very simple: call 3 or 4 executables in sequence to process some data.  The data
is just a simple image and the processing takes tens of minutes.

We are considering a distributed architecture to increase throughput (latency does not matter).
 So we need a way to queue work on remote computers, and a way to move the data around.  The
architecture will have to work n a single server, or on a couple of servers in a rack, or
in the cloud; 2 or 3 computers maximum.

Being new to all this I would prefer something simple rather than something super-powerful.

I was considering Hadoop YARN and Hadoop DFS, does this make sense?  I’m assuming MapReduce
would be over the top, is that the case?    

Thanks in advance.

View raw message