hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Katkov <ikat...@gmail.com>
Subject Re: HDFS data locality
Date Wed, 18 Nov 2009 04:37:08 GMT
> So when you write a file to HDFS, you first write on the local
>  Datanode then it's streamed to other DNs

This is where I'm confused, let's assume typical deployment schema when
datanode and regionserver daemons co-exist on the same hosts.
A regionserver talks with whatever datanodes it likes not necessarily with
the local daemon. At compaction it writes to an abstract HDFS location i.e.
namenode decides what phisical host will be accepting these bytes - again,
not necessarily the same regionserver's host.

>so the new files created in the regions are on the same node
the new files must be on HDFS, how do they get written to the local HDFS
daemon?


On Tue, Nov 17, 2009 at 5:51 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> The master doesn't assign in function of locality, we rely on the way
> HDFS works. Also, it's almost impossible to assign regions based on
> locality as all the files could be on a different node and moving it
> around for the sake of locality would mean moving around possible
> hundreds of GB...
>
> So when you write a file to HDFS, you first write on the local
> Datanode then it's streamed to other DNs. If you have a pretty normal
> production cluster that stays up 24/7, the regions won't move around
> so the new files created in the regions are on the same node. Also,
> every 24 hours the major compaction thread rewrites all store files
> into one (if needed) for each family and, again, you get locality.
>
> J-D
>
> On Tue, Nov 17, 2009 at 2:43 PM, Igor Katkov <ikatkov@gmail.com> wrote:
> > Hi,
> >
> > When HMaster assigns regions to region servers does it try to ensure that
> > these files will be located on the same host in HDFS? It does not, does
> not
> > it?
> > So most likely HBase RegionServers are very chatty over the network,
> reading
> > and writing from/to the HDFS daemons on other nodes.
> >
> > Is there a way to improve it? To make RegionServer mostly talk to the
> local
> > DataNode only?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message