hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ananth Gundabattula <agundabatt...@gmail.com>
Subject Re: Controlling the block placement and the file placement in HDFS writes
Date Fri, 19 Dec 2014 21:52:59 GMT
Hello Zhe,

Thanks a lot for the inputs. Storage policies is really what I was looking
for one of the problems.

@Nick: I agree that it would be a nice feature to have. Thanks for the
info.

Regards,
Ananth

On Fri, Dec 19, 2014 at 10:49 AM, Nick Dimiduk <ndimiduk@gmail.com> wrote:

> HBase would enjoy a similar functionality. In our case, we'd like all
> replicas for all files in a given HDFS path to land on the same set of
> machines. That way, in the event of a failover, regions can be assigned to
> one of these other machines that has local access to all blocks for all
> region files.
>
> On Thu, Dec 18, 2014 at 3:36 PM, Zhe Zhang <zhe.zhang.research@gmail.com>
> wrote:
> >
> > > The second aspect is that our queries are time based and this time
> window
> > > follows a familiar pattern of old data not being queried much. Hence we
> > > would like to preserve the most recent data in the HDFS cache ( impala
> is
> > > helping us manage this aspect via their command set ) but we would like
> > the
> > > next recent amount of data chunks to land on an SSD that is present on
> > > every datanode. The remaining set of blocks which are "very old but in
> > > large quantities" would land on spinning disks. The decision to choose
> a
> > > given volume is based on the file name as we can control the filename
> > that
> > > is being used to generate the file.
> > >
> >
> > Have you tried the 'setStoragePolicy' command? It's part of the HDFS
> > "Heterogeneous Storage Tiers" work and seems to address your scenario.
> >
> > > 1. Is there a way to control that all file blocks belonging to a
> > particular
> > > hdfs directory & file go to the same physical datanode ( and their
> > > corresponding replicas as well ? )
> >
> > This seems inherently hard: the file/dir could have more data than a
> > single DataNode can host. Implementation wise, if requires some sort
> > of a map in BlockPlacementPolicy from inode or file path to DataNode
> > address.
> >
> > My 2 cents..
> >
> > --
> > Zhe Zhang
> > Software Engineer, Cloudera
> > https://sites.google.com/site/zhezhangresearch/
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message