hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: Data Processing in hbase
Date Wed, 22 Jul 2009 06:54:35 GMT
HBase is meant to store large tables. The intention is to store data in a
way thats more scalable as compared to traditional database systems. Now,
HBase is built over Hadoop and has the option of being used as the data
store for MR jobs. However, thats not the only purpose.

In all data storage systems (except embedded databases), you would have to
fetch data to where computation has to be performed. The whole MR design
philosophy is to take the code to the data and execute it as close to where
the data is stored as possible.


On Tue, Jul 21, 2009 at 11:48 PM, bharath vissapragada <
bharathvissapragada1990@gmail.com> wrote:

> That means .. it is not very useful to write java codes (using API)  ..
> because any way it is not using the real power of hadoop(distributed
> processing) instead it has the overhead of fetching data from other
> machines
> right?
>
> On Wed, Jul 22, 2009 at 12:12 PM, Amandeep Khurana <amansk@gmail.com>
> wrote:
>
> > Yes.. Only if you use MR. If you are writing your own code, it'll pull
> the
> > records to the place where you run the code.
> >
> > On Tue, Jul 21, 2009 at 11:39 PM, Fernando Padilla <fern@alum.mit.edu
> > >wrote:
> >
> > > That is if you use Hadoop MapReduce right? Not if you simply access
> HBase
> > > through a standard api (like java)?
> > >
> > >
> > >
> > > On 7/21/09 9:49 PM, Amandeep Khurana wrote:
> > >
> > >> Bharath,
> > >>
> > >> The processing is done as local to the RS as possible. The first
> attempt
> > >> is
> > >> at doing it local on the same node. If thats not possible, its done on
> > the
> > >> same rack.
> > >>
> > >> -ak
> > >>
> > >>
> > >> On Tue, Jul 21, 2009 at 9:43 PM, bharath vissapragada<
> > >> bharat_v@students.iiit.ac.in>  wrote:
> > >>
> > >>  Hi all,
> > >>>
> > >>> I have one simple doubt in hbase ,
> > >>>
> > >>> Suppose i use a scanner to iterate through all the rows in the hbase
> > and
> > >>> process the data in the table corresponding to those rows .Is the
> > >>> processing
> > >>> of that data done locally on the region server in which that
> particular
> > >>> region is located or is it transferred over network so that all the
> > >>> processing is done on a single machine on which that script runs!!
> > >>>
> > >>> thanks
> > >>>
> > >>>
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message