hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Kang <weliam.cl...@gmail.com>
Subject Re: Parallel computing on HBase
Date Thu, 07 Oct 2010 06:08:03 GMT
Ryan, thanks for your explanation. It is very clear and helpful.

Andy, I think Hbase-2000 is exactly what I was asking for. In general, MR is
not built for low-latency purpose. But our applications do need something
fast and low weight. For example, we might just want to know the mean of our
query results over some values inside rows. If each region server can
calculate the mean of rows it contains on itself instead of transporting
every row back to the client, it would be much faster to get the final
result. Will hbase-2000 be able to do it? And would you please share more
information about the development process and how may I contribute to it?
Many thanks.


William

On Wed, Oct 6, 2010 at 11:57 AM, Andrew Purtell <apurtell@apache.org> wrote:

> Hi William,
>
> I think you are asking about HBASE-2000:
> https://issues.apache.org/jira/browse/HBASE-2000
>
> Work on an in-process parallel execution framework for HBase is in
> progress, yes. We have some initial patches up for review which are the
> start of this.
>
> Best regards,
>
>    - Andy
>
>
> --- On Tue, 10/5/10, Ryan Rawson <ryanobjc@gmail.com> wrote:
>
> > From: Ryan Rawson <ryanobjc@gmail.com>
> > Subject: Re: Parallel computing on HBase
> > To: user@hbase.apache.org
> > Date: Tuesday, October 5, 2010, 11:10 PM
> > You understand the hbase data model
> > yes?  Each region gets a mapper
> > and each mapper reads the rows for that region feeding it
> > into the map
> > functions.  On the output side, each reducer just
> > writes to hbase. The
> > parallelism can support millions of row reads/second.
> >
> > I don't understand the rest of your question
> > unfortunately.
> >
> > good luck!
> > -ryan
> >
> > On Tue, Oct 5, 2010 at 9:40 PM, William Kang <weliam.cloud@gmail.com>
> > wrote:
> > > Can you tell me a little about how HBase works with
> > MR? If the MR
> > > source/sink has to go through just ONE region client,
> > then it is not I am
> > > looking for. But if MR can plug directly with the
> > region server containing
> > > specific rows, then it might work. Furthermore, MR is
> > a heavy weight process
> > > with lots of overhead. Ideally, we want something
> > light weight and can get
> > > result fast. Many thanks.
> > >
> > >
> > > William
> > >
> > > On Wed, Oct 6, 2010 at 12:01 AM, Jeff Zhang <zjffdu@gmail.com>
> > wrote:
> > >
> > >> You can incorporate map reduce with hbase for
> > parallel computing.
> > >>
> > >>
> > >>
> > >> On Wed, Oct 6, 2010 at 11:24 AM, William Kang
> > <weliam.cloud@gmail.com>
> > >> wrote:
> > >> > Hi guys,
> > >> > Is there any project going on co-processing
> > on region servers? Right now,
> > >> we
> > >> > have to transfer all data from region servers
> > to region client after
> > >> query,
> > >> > is that right? This can be slow. Furthermore,
> > the cpus on the region
> > >> servers
> > >> > are not fully used. If we could distribute
> > the computation along with the
> > >> > data on region server, that would be really
> > handy for some problems. Is
> > >> it
> > >> > possible to do so? Many thanks.
> > >> >
> > >> >
> > >> > William
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Best Regards
> > >>
> > >> Jeff Zhang
> > >>
> > >
> >
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message