hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@yahoo.com>
Subject Re: HBase with EMR
Date Sat, 03 Mar 2012 17:21:18 GMT
I think there are a couple of things conflated here. Let me make four brief points and then
feel free to follow up where you would like more information. 

1) Many run HBase (and self-hosted Hadoop) on EC2. These clusters have their own HDFS on EBS
or instance store volumes. 

2) You cannot run HBase backed by S3. Search on other HBase user list emails on the subject.
 But this of course does not mean you cannot run HBase on EC2. (See point 1.)

3) Your EMR jobs can talk to your other EC2 resources, such as a HBase cluster running off
to the side. 

4) You can perform custom setup time actions for your EMR clusters, which can set up HBase
to run (using the cluster's HDFS file system). Then your EMR job had a transient HBase for
doing things like holding large intermediate representations (sparse matrix or whatever) that
require random access. Of course here when the EMR job is complete, everything will be torn

Best regards,

    - Andy

On Mar 3, 2012, at 3:45 AM, Mohit Gupta <success.mohit.gupta@gmail.com> wrote:

> Hi,
> I am a bit confused about using HBase with EMR. In one of the previous
> thread ( and in EMR Documentation
> http://aws.amazon.com/elasticmapreduce/), it is said that S3 is the
> only option available to be used as
> source/destination at the moment. But I have come around a couple of blogs
> saying that those people are actually using HBase with EMR. ( one is
> http://whynosql.com/why-we-run-our-hbase-on-ec2/ ).
> I have a scenario where running EMR with Hbase would be really useful.
> Please let me know if its possible or if there is any workaround available
> for this( like first transferring the data to s3 and then to EMR).
> -- 
> Best Regards,
> Mohit Gupta
> Software Engineer at Vdopia Inc.

View raw message