hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Whitehead" <richard.whiteh...@ieee.org>
Subject Building a distributed system
Date Mon, 18 Jul 2016 16:17:20 GMT
Hello, 

I wonder if the community can help me get started.

I’m trying to design the architecture of a project and I think that using some Apache Hadoop
technologies may make sense, but I am completely new to distributed systems and to Apache
(I am a very experienced developer, but my expertise is image processing on Windows!).

The task is very simple: call 3 or 4 executables in sequence to process some data.  The data
is just a simple image and the processing takes tens of minutes.

We are considering a distributed architecture to increase throughput (latency does not matter).
 So we need a way to queue work on remote computers, and a way to move the data around.  The
architecture will have to work n a single server, or on a couple of servers in a rack, or
in the cloud; 2 or 3 computers maximum.

Being new to all this I would prefer something simple rather than something super-powerful.

I was considering Hadoop YARN and Hadoop DFS, does this make sense?  I’m assuming MapReduce
would be over the top, is that the case?    

Thanks in advance.

Richard
Mime
View raw message