hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naama Kraus" <naamakr...@gmail.com>
Subject Re: HBase and locality issues
Date Mon, 16 Jun 2008 03:22:31 GMT
Thanks for the helpful information.

Naama

On Mon, Jun 16, 2008 at 12:17 AM, Jim Kellerman <jim@powerset.com> wrote:

> Comments inline below.
>
> ---
> Jim Kellerman, Senior Engineer; Powerset
>
>
> > -----Original Message-----
> > From: Naama Kraus [mailto:naamakraus@gmail.com]
> > Sent: Sunday, June 15, 2008 3:39 AM
> > To: hbase-user@hadoop.apache.org
> > Subject: HBase and locality issues
> >
> > Hi,
> >
> > I have some questions regarding HBase and locality issues -
> > I'd appreciate some explanations and clarifications.
> >
> > I understand HBase is built on top of HDFS.
> > Say an HRegionServer creates a HStoreFile where it puts some
> > column family content. Does HDFS split the file to multiple
> > HDFS blocks and distributes them around bunch of machines ?
>
> Yes. HStoreFile is currently implemented using org.apache.hadoop.io.MapFile
>
> > If that's the case, when the region server needs to actually
> > access the files, does HDFS underneath communicates remote
> > machines to read the various blocks ?
>
> Sometimes. If a requested block is local, HDFS will try to get that one.
>
> > Doesn't it hurt performance since there is no locality in data access
> > (region server actually works on remote blocks).
>
> Somewhat. We have other areas that we have identified as larger performance
> bottlenecks that need to be addressed first.
>
> > Or is the HStoreFile implemented in some other way which
> > writes it to the local disks of the region server node
> > machine that owns it ?
>
> No. Blocks are placed according to HDFS strategies.
>
> > If so, then how ? Does this code overrides the HDFS behavior ?
>
> It doesn't.
>
> > Another related question is about Map Reduce and HBase. When
> > a MapReduce job  runs on top of HBase - i.e. gets  a table as
> > an input. How does the MapReduce  framework know how to
> > schedule  map tasks near data ? Does it have any knowledge of
> > the actual location of the data pieces composing the table to
> > be processed ?
>
> No. It is on our list of things to do. See HBASE-57
>
> > I'd be also glad to get pointers to the related source code (classes).
> >
> > Thanks for any information,
> > Naama
> >
> > --
> > oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00
> > oo 00 oo 00 oo 00 oo 00 oo "If you want your children to be
> > intelligent, read them fairy tales. If you want them to be
> > more intelligent, read them more fairy tales." (Albert
> > Einstein)
>
> No virus found in this outgoing message.
> Checked by AVG.
> Version: 8.0.100 / Virus Database: 270.3.0/1503 - Release Date: 6/14/2008
> 6:02 PM
>



-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message