hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Robertson <timrobertson...@gmail.com>
Subject Re: Questions about data distribution in HBase
Date Sat, 27 Mar 2010 17:54:44 GMT
I would consider option 3) if it were me (I am not an expert).  It is
common to use HBase tables as the input format for map reduce jobs.
I don't think it is as easy as assuming that the 3 videos will go over
3 machines when storing, but certainly as the volume grows it will
distribute, and by using MR the processing will try and run as close
to the data as possible.


On Sat, Mar 27, 2010 at 6:06 PM, William Kang <weliam.cloud@gmail.com> wrote:
> Hi,
> I am quite confused about the distributions of data in a HBase system.
> For instance, if I store 10 videos in 10 HTable rows' cell, I assume that
> these 10 videos will be stored in different data nodes (regionservers) in
> HBase. Now, if I wrote a program that do some processes for these 10 videos
> parallel, what' going to happen?
> Since I only deployed the program in a jar to the master server in HBase,
> will all videos in the HBase system have to be transfered into the master
> server to get processed?
> 1. Or do I have another option to assign where the computing should happen
> so I do not have to transfer the data over the network and use the region
> server's cpu to calculate the process?
> 2. Or should I deploy the program jar to each region server so the region
> server can use local cpu on the local data? Will HBase system do that
> automatically?
> 3. Or I need plug M/R into HBase in order to use the local data and
> parallelization in processes?
> Many thanks.
> William

View raw message