hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yongjun Zhang <yzh...@cloudera.com>
Subject Re: Controlling the block placement and the file placement in HDFS writes
Date Fri, 19 Dec 2014 22:20:05 GMT
Hi,

FYI,

A relevant jira HDFS-6133 tries to tell Balancer not to move around the
blocks stored at the favored nodes that application selected. I reviewed
the patch, and the latest on looks good to me. Hope some committers can
pick it up and push it forward.

Thanks.

--Yongjun


On Fri, Dec 19, 2014 at 1:52 PM, Ananth Gundabattula <
agundabattula@gmail.com> wrote:
>
> Hello Zhe,
>
> Thanks a lot for the inputs. Storage policies is really what I was looking
> for one of the problems.
>
> @Nick: I agree that it would be a nice feature to have. Thanks for the
> info.
>
> Regards,
> Ananth
>
> On Fri, Dec 19, 2014 at 10:49 AM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
>
> > HBase would enjoy a similar functionality. In our case, we'd like all
> > replicas for all files in a given HDFS path to land on the same set of
> > machines. That way, in the event of a failover, regions can be assigned
> to
> > one of these other machines that has local access to all blocks for all
> > region files.
> >
> > On Thu, Dec 18, 2014 at 3:36 PM, Zhe Zhang <zhe.zhang.research@gmail.com
> >
> > wrote:
> > >
> > > > The second aspect is that our queries are time based and this time
> > window
> > > > follows a familiar pattern of old data not being queried much. Hence
> we
> > > > would like to preserve the most recent data in the HDFS cache (
> impala
> > is
> > > > helping us manage this aspect via their command set ) but we would
> like
> > > the
> > > > next recent amount of data chunks to land on an SSD that is present
> on
> > > > every datanode. The remaining set of blocks which are "very old but
> in
> > > > large quantities" would land on spinning disks. The decision to
> choose
> > a
> > > > given volume is based on the file name as we can control the filename
> > > that
> > > > is being used to generate the file.
> > > >
> > >
> > > Have you tried the 'setStoragePolicy' command? It's part of the HDFS
> > > "Heterogeneous Storage Tiers" work and seems to address your scenario.
> > >
> > > > 1. Is there a way to control that all file blocks belonging to a
> > > particular
> > > > hdfs directory & file go to the same physical datanode ( and their
> > > > corresponding replicas as well ? )
> > >
> > > This seems inherently hard: the file/dir could have more data than a
> > > single DataNode can host. Implementation wise, if requires some sort
> > > of a map in BlockPlacementPolicy from inode or file path to DataNode
> > > address.
> > >
> > > My 2 cents..
> > >
> > > --
> > > Zhe Zhang
> > > Software Engineer, Cloudera
> > > https://sites.google.com/site/zhezhangresearch/
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message