hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russ Ferriday <russ.ferri...@gmail.com>
Subject Re: Hadoop / HDFS equalivant but for realtime request handling / small files?
Date Tue, 01 Feb 2011 21:48:59 GMT
Hi Zachary,

Have you heard of Cassandra?
You may be able to write processing nodes accessing data on Cassandra.
Probably the easiest configuration is that on each node you have processing
functions and a Cassandra node.   Then as you expand your computing cluster,
you also expand your cassandra bandwidth.
This is not optimal, but very practical for a small project/small team.

On Tue, Feb 1, 2011 at 1:21 PM, Zachary Kozick <zach@omniar.com> wrote:

> Hi all,
> I'm interested in creating a solution that leverages multiple computing
> nodes in an EC2 or Rackspace cloud environment in order to
> do massively parallelized processing in the context of serving HTTP
> requests, meaning I want results to be aggregated within 1-4 seconds.
> From what I gather, Hadoop is designed for job-oriented tasks and the
> minimum job completion time is 30 seconds.  Also HDFS is meant for storing
> few large files, as opposed to many small files.
> My question is there a framework similar to hadoop that is designed more
> for on-demand parallel computing?  What about a technology similar to HDFS
> that is better at moving around small files and making them available to
> slave nodes on demand?

View raw message