hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: Data Processing in hbase
Date Wed, 22 Jul 2009 07:12:07 GMT
On Wed, Jul 22, 2009 at 12:07 AM, bharath vissapragada <
bharathvissapragada1990@gmail.com> wrote:

> That means we have to stick to the principle of MR whenever we require
> efficient data processing ..
> but map reduce cannot offer solutions to gnrl database problems i guess!
>

I'd recommend you read up the papers on MR, BigTable, and some of the latest
stuff on HadoopDB etc. That'll give you clarity.


>
> On Wed, Jul 22, 2009 at 12:34 PM, Amandeep Khurana <amansk@gmail.com>
> wrote:
>
> > On Wed, Jul 22, 2009 at 12:01 AM, bharath vissapragada <
> > bharathvissapragada1990@gmail.com> wrote:
> >
> > > suppose i non MR codes using java API such that it involves pprocessing
> > of
> > > huge data (100s of GBs) .. then is there an overhead of fetching data
> > (such
> > > a huge amnt) from other machines ..?
> >
> >
> > Ofcourse. Network and I/O overheads definitely plague processing large
> > datasets.
> >
> >
> > >
> > >
> > > On Wed, Jul 22, 2009 at 12:24 PM, Amandeep Khurana <amansk@gmail.com>
> > > wrote:
> > >
> > > > HBase is meant to store large tables. The intention is to store data
> in
> > a
> > > > way thats more scalable as compared to traditional database systems.
> > Now,
> > > > HBase is built over Hadoop and has the option of being used as the
> data
> > > > store for MR jobs. However, thats not the only purpose.
> > > >
> > > > In all data storage systems (except embedded databases), you would
> have
> > > to
> > > > fetch data to where computation has to be performed. The whole MR
> > design
> > > > philosophy is to take the code to the data and execute it as close to
> > > where
> > > > the data is stored as possible.
> > > >
> > > >
> > > > On Tue, Jul 21, 2009 at 11:48 PM, bharath vissapragada <
> > > > bharathvissapragada1990@gmail.com> wrote:
> > > >
> > > > > That means .. it is not very useful to write java codes (using API)
> >  ..
> > > > > because any way it is not using the real power of
> hadoop(distributed
> > > > > processing) instead it has the overhead of fetching data from other
> > > > > machines
> > > > > right?
> > > > >
> > > > > On Wed, Jul 22, 2009 at 12:12 PM, Amandeep Khurana <
> amansk@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Yes.. Only if you use MR. If you are writing your own code,
it'll
> > > pull
> > > > > the
> > > > > > records to the place where you run the code.
> > > > > >
> > > > > > On Tue, Jul 21, 2009 at 11:39 PM, Fernando Padilla <
> > > fern@alum.mit.edu
> > > > > > >wrote:
> > > > > >
> > > > > > > That is if you use Hadoop MapReduce right? Not if you simply
> > access
> > > > > HBase
> > > > > > > through a standard api (like java)?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 7/21/09 9:49 PM, Amandeep Khurana wrote:
> > > > > > >
> > > > > > >> Bharath,
> > > > > > >>
> > > > > > >> The processing is done as local to the RS as possible.
The
> first
> > > > > attempt
> > > > > > >> is
> > > > > > >> at doing it local on the same node. If thats not possible,
its
> > > done
> > > > on
> > > > > > the
> > > > > > >> same rack.
> > > > > > >>
> > > > > > >> -ak
> > > > > > >>
> > > > > > >>
> > > > > > >> On Tue, Jul 21, 2009 at 9:43 PM, bharath vissapragada<
> > > > > > >> bharat_v@students.iiit.ac.in>  wrote:
> > > > > > >>
> > > > > > >>  Hi all,
> > > > > > >>>
> > > > > > >>> I have one simple doubt in hbase ,
> > > > > > >>>
> > > > > > >>> Suppose i use a scanner to iterate through all
the rows in
> the
> > > > hbase
> > > > > > and
> > > > > > >>> process the data in the table corresponding to
those rows .Is
> > the
> > > > > > >>> processing
> > > > > > >>> of that data done locally on the region server
in which that
> > > > > particular
> > > > > > >>> region is located or is it transferred over network
so that
> all
> > > the
> > > > > > >>> processing is done on a single machine on which
that script
> > > runs!!
> > > > > > >>>
> > > > > > >>> thanks
> > > > > > >>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message