Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 34063 invoked from network); 1 Feb 2011 21:21:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Feb 2011 21:21:54 -0000 Received: (qmail 9816 invoked by uid 500); 1 Feb 2011 21:21:53 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 9510 invoked by uid 500); 1 Feb 2011 21:21:53 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 9502 invoked by uid 99); 1 Feb 2011 21:21:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Feb 2011 21:21:52 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.82.176] (HELO mail-wy0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Feb 2011 21:21:45 +0000 Received: by wye20 with SMTP id 20so7482371wye.35 for ; Tue, 01 Feb 2011 13:21:24 -0800 (PST) Received: by 10.216.30.81 with SMTP id j59mr8748567wea.39.1296595284402; Tue, 01 Feb 2011 13:21:24 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.50.4 with HTTP; Tue, 1 Feb 2011 13:21:04 -0800 (PST) From: Zachary Kozick Date: Tue, 1 Feb 2011 14:21:04 -0700 Message-ID: Subject: Hadoop / HDFS equalivant but for realtime request handling / small files? To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e6dd89f55829e1049b3f1d40 X-Virus-Checked: Checked by ClamAV on apache.org --0016e6dd89f55829e1049b3f1d40 Content-Type: text/plain; charset=ISO-8859-1 Hi all, I'm interested in creating a solution that leverages multiple computing nodes in an EC2 or Rackspace cloud environment in order to do massively parallelized processing in the context of serving HTTP requests, meaning I want results to be aggregated within 1-4 seconds. >From what I gather, Hadoop is designed for job-oriented tasks and the minimum job completion time is 30 seconds. Also HDFS is meant for storing few large files, as opposed to many small files. My question is there a framework similar to hadoop that is designed more for on-demand parallel computing? What about a technology similar to HDFS that is better at moving around small files and making them available to slave nodes on demand? --0016e6dd89f55829e1049b3f1d40 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi all,

I'm inte= rested in creating a solution that leverages=A0multiple=A0computing nodes i= n an EC2 or Rackspace cloud environment in order to do=A0massively=A0parall= elized processing in the context of serving HTTP requests, meaning I want r= esults to be aggregated within 1-4 seconds. =A0

<= /font>
From what I = gather, Hadoop is designed for job-oriented tasks and the minimum job compl= etion time is 30 seconds. =A0Also HDFS is meant for storing few large files= , as opposed to many small files.

<= /font>
My question = is there a framework similar to hadoop that is designed more for on-demand = parallel computing? =A0What about a technology similar to HDFS that is bett= er at moving around small files and making them available to slave nodes on= demand?
--0016e6dd89f55829e1049b3f1d40--