hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Painter <m...@deity.co.nz>
Subject Suitability of HDFS for live file store
Date Mon, 15 Oct 2012 19:47:26 GMT
Hi,

I am a new Hadoop user, and would really appreciate your opinions on
whether Hadoop is the right tool for what I'm thinking of using it for.

I am investigating options for scaling an archive of around 100Tb of image
data. These images are typically TIFF files of around 50-100Mb each and
need to be made available online in realtime. Access to the files will be
sporadic and occasional, but writing the files will be a daily activity.
Speed of write is not particularly important.

Our previous solution was a monolithic, expensive - and very full - SAN so
I am excited by Hadoop's distributed, extensible, redundant architecture.

My concern is that a lot of the discussion on and use cases for Hadoop is
regarding data processing with MapReduce and - from what I understand -
using HDFS for the purpose of input for MapReduce jobs. My other concern is
vague indication that it's not a 'real-time' system. We may be using
MapReduce in small components of the application, but it will most likely
be in file access analysis rather than any processing on the files
themselves.

In other words, what I really want is a distributed, resilient, scalable
filesystem.

Is Hadoop suitable if we just use this facility, or would I be misusing it
and inviting grief?

M

Mime
View raw message